`

Endeca Note

阅读更多

Comes form blog:http://www.ateam-oracle.com/notes-on-querying-endeca-from-within-an-atg-application/

Background

On a few projects in 2014, the issue of Endeca’s performance came up. Specifically, applications were seeing a large number of queries and were also generating large response sizes from Endeca. These queries were not being generated by the Assembler API, but were one-off queries created to bring back other information from Endeca.

This article will give some tips on how to optimize those queries.

Notes on Endeca Query Response objects

A response from Endeca can consist of a number of different pieces of data:

  • Records (aka, products)
  • Dimensions to navigate on
  • Breadcrumbs (also sometimes called Descriptors), which return information on which things have been navigated on
  • Supplemental objects: Used internally, these bring back meta-data about the rules being executed (such as landing pages, content slots, etc.)
  • Dimension search results: These are special queries that search only within dimensions, not products.
  • Key properties: Almost never used. Returns meta-data about the properties and dimensions in the index.
  • Other information about the index (such as the list of available sort keys, search fields, etc). You can’t really turn any of that off.

In an Assembler based application, the overall flow for a standard page being rendered would be this:

First, a super-lightweight query is executed that only returns Supplemental objects representing information from Experience Manager. Depending on that meta-data, one or more subsequent queries is executed that will return:

  1. 1. Dimensions
  2. 2. Records
  3. 3. Breadcrumbs
  4. 4. NOT supplemental objects (this is true for patched 10.2, 11.0 and 11.1)

Thus a standard basic page would generate about two Endeca queries to be rendered. There are some cartridges that generate more:

  • featured records cartridges generate one query per cartridge. So if you have three featured record cartridges, there would be three extra queries

In addition, the Assembler API is usually very good about only bringing back the exact information it needs. This means that it will only “open up” the dimensions being requested, and will return back the attributes on the products specified in the ResultsList configuration.

When it comes to making standalone queries to Endeca, you need to understand the information above so as to NOT bring back any more data than necessary.

Scanning for Standalone Endeca queries in code

The easiest way to scan an application for one-off Endeca queries is to do a search for “ENEQuery” or “ENEQueryResults” in you .java classes. In addition, searching for “InvokeAssembler” in .jsp’s. You can also search for UrlENEQuery.

If you find many instances of those, each and every query involved should be assessed using the information laid out below.

Improving your queries

Limiting which properties to return on a record

In a standard commerce application, it’s not uncommon for one record (representing a product or SKU) to have 80 or 100 or more properties. These can be things like product ID, UPC, short descriptions, size/color/widths, prices, image URLs, etc.

If you are not careful, it’s very easy to return all 80 or 100 or whatever property values with a standalone query.

To look at what comes back by default, you can look at the orange JSP reference application (typically located athttp://localhost:8006/endeca_jspref, with a host of localhost and port of 15000).

The list of properties to be returned can be controlled by using ENEQuery.setSelection(). This requires that you to specify every single property (and dimension) to be returned. It is case-sensitive.

Limit the number of records to return

By default, a query will return 10 records. To limit this, you can use ENEQuery.setNavNumERecs.

In the .reqlog, if you see &nbins=10, that means that someone didn’t set this value specifically and is probably using the default.

At the same time, you shouldn’t set this value to be too large. If you find yourself setting this to 50 or 100, you might be doing something wrong.

Omit Supplemental Objects

A supplemental object is the meta-data about a landing page or content slot. If you use the orange reference application, at the top you’ll see one or more “Unknown Merch Style”. Scrolling to the bottom of page, you’ll see a series of “Matching Supplemental Objects”.

What’s the big deal about these? Well, these can actually get somewhat large in size (for instance, if you have cartridges that allow merchandisers to copy/paste raw HTML). Also, the only real time they need to come back is when doing Assembler queries; not one-off queries.

There’s no flag for turning supplemental objects on/off. However, you can add a merch rule filter that will have the effect of turning them off. (This is what a hotfix for 10.2 and what 11.x do by default. If you look in the .reqlog, you’ll see&merchrulefilter=endeca.internal.nonexistent in some of the queries).

This can be turned on by using ENEQuery.setNavMerchRuleFilter(). Basically any nonsense string in here will have the correct effect. This would also be a good place to put a message in for logging purposes. Something likeENEQuery.setNavMerchRuleFilter(“topNavigationQuery”).

In the .reqlog, you should see &merchrulefilter .

Don’t expose all dimensions

If you look at the orange reference app, you’ll see that the dimensions on the left side are “closed” up. If you click one, the page will refresh and now that dimension will be “opened” up.

If you would like to open up all dimensions, you can use ENEQuery.setNavAllRefinements(true).

However, this can be potentially very expensive. With no dimensions being returned, the MDEX Engine doesn’t have to compute the refinement counts (aka, “How many records are there for Brand=XYZ”?) Also, this can inflate the response size greatly, especially for big flag dimensions.

Instead, you should specify which particular dimensions you want to return. Unfortunately, you need to specify the ID of the dimension, not the name.

If you know the ID of the dimensions you care about, you can use UrlENEQuery.setNe() and pass in a string like “123+234+532″.

Looking through the .reqlog, if you see &allgroups=1, that means somewhere someone has setNavAllRefinements(true).

Use record filters instead of keyword search

Let’s say you’re on a product details page. If you know the ID of the product, you have two choices: You can do a keyword search on the ID field passing in the string of the value. Or you can construct a record filter. A record filter is usually faster and cleaner. (There’s no reason to fill your logs with searches that customers didn’t type in).

ENEQuery.setNavRecordFilter() is the method. An example might be: query.setNavRecordFilter(“AND(product.id:2342342))”.

Use setQueryInfo for logging custom things to Endeca’s .reqlog files

A little-used feature is the ENEQuery.setQueryInfo() method. This lets you stuff any number of key/value pairs that get sent to the MDEX Engine, ignored, but written out to the .reqlog file. This can be useful for adding things like session ID, debug information, etc.

For our case, what might be good is to write out why this query is being executed. “pdpBreadcrumbs” “typeahead” , etc.

This way, if there are slow or big queries found during performance testing, it will help track them down and help distinguish between real Assembler queries and your one-off queries.

These messages will show up in the .reqlog as &log=

Don’t ever set setNavERecsPerAggrERec to 2

ENEQuery.setNavERecsPerAggrERec() allows you to specify how many records are returned per aggregate record. For example, say you are a clothing website. You probably index by SKU (which would represent a single Size/Color combination for a product). When doing query to Endeca, instead of returning info at a SKU level, you would aggregate things by a rollup key using ENEQuery.setNavRollupKey().

setNavERecsPerAggrERec() allows to you bring back 0, 1 or all SKUs within a product. You should do everything possible to NOT set it to the value of “2”, which is all.

(As a point of reference, ENEQuery has 3 static values representing those numbers. ZERO_ERECS_PER_AGGR, ONE_EREC_PER_AGGR, ALL_ERECS_PER_AGGR).

In the .reqlog, if you see &allbins=2, then that means someone setNavERecsPerAggrERec(ALL_ERECS_PER_AGGR).

Now, this might make things complicated for you. For instance, at Eddie Bauer on the search results page, they wanted to display the color swatches from each different SKU. By setting it to bring back all, they were able to iterate across all of the SKUs in the product to generate that list.

Instead, things were changed so that each SKU was tagged with information about all of the other SKUs. This allowed us to change this from 2 to 1. Response sizes went from 10 megs to 100kb.

Watch out for filtering based on timestamps

For some commerce sites, they might set products to activate during the day (“Starting at 1pm EST, this product should show up, but before that, it shouldn’t”).

One way to do this would be to tag all products with a start date and end date. And then with each query to Endeca, pass along a range filter for the dates.

The problem, however, can be that the MDEX Engine does some internal caching based on these values. If the date value you specify is too granular, then the MDEX won’t work as fast as it could. So don’t specify a timestamp down to the second or millisecond. Try and do timestamps for the hour, or at least chunks of minutes (like 20 or 30 minutes) to ensure that some cache hits occur.

Range filters can be set by using setNavRangeFilters().

In the .reqlog, you can look for &pred . A CRS example might look like: pred=product.endDate%7cGTEQ+1.4163552E12&pred=product.startDate%7cLTEQ+1.4163552E12

Don’t return key properties

This is a little-used feature, so it’s not something you’d come across very often. Key properties return meta-data about the definitions of properties and dimensions themselves. This can be turned on using ENEQuery.setNavKeyProperties(ENEQuery.KEY_PROPS_ALL).

This can greatly inflate the response size of a query from Endeca.

If you do need this for some reason, you should only need to execute the query once, and then cache the results from it.

This can be found in the .reqlog as &keyprops=all

Things that CRS does that aren’t optimal

Careful readers might notice that CRS breaks some of the rules above. In particular:

  • CRS filters based on timestamps
  • CRS used to do setNavERecsPerAggrERec = 2

What would the worst query in the world look like?

As an interesting point of reference, the world’s worst Endeca query would:

  • setNavAllRefinements(true)
  • not use .setSelection()
  • not use .setNavMerchRuleFilter()
  • uses setNavRollupKey()
  • does a wildcard keyword search
  • have a high number of search terms (in addition to the wildcard)
  • setNavNumERecs() to a large value
  • setNavKeyProperties(ENEQuery.KEY_PROPS_ALL)
  • sorts on something not frequently sorted on
  • uses pagination ( .setNavERecsOffset) to go to a high page number\
  • use a geospatial filter
  • uses a range filter ( .setNavRangeFilters())

What would the world’s fastest query look like?

  • no keyword search
  • setNavAllRefinements(false)
  • setNavNumERecs(0)
  • setNavMerchRuleFilter(“lksdkjfd”)
  • doesn’t touch setNavKeyProperties()
  • uses a setNavRecordFilter() for a record filter that had been previously used and basically filters everything out
分享到:
评论

相关推荐

    (179979052)基于MATLAB车牌识别系统【带界面GUI】.zip

    基于MATLAB车牌识别系统【带界面GUI】.zip。内容来源于网络分享,如有侵权请联系我删除。另外如果没有积分的同学需要下载,请私信我。

    DG储能选址定容模型matlab 程序采用改进粒子群算法,考虑时序性得到分布式和储能的选址定容模型,程序运行可靠 这段程序是一个改进的粒子群算法,主要用于解决电力系统中的优化问题 下面我将对程序进行详

    DG储能选址定容模型matlab 程序采用改进粒子群算法,考虑时序性得到分布式和储能的选址定容模型,程序运行可靠 这段程序是一个改进的粒子群算法,主要用于解决电力系统中的优化问题。下面我将对程序进行详细分析。 首先,程序开始时加载了一些数据文件,包括gfjl、fljl、fhjl1、cjgs和fhbl。这些文件可能包含了电力系统的各种参数和数据。 接下来是一些参数的设置,包括三种蓄电池的参数矩阵、迭代次数、种群大小、速度更新参数、惯性权重、储能动作策略和限制条件等。 然后,程序进行了一些初始化操作,包括初始化种群、速度和适应度等。 接下来是主要的迭代过程。程序使用粒子群算法的思想,通过更新粒子的位置和速度来寻找最优解。在每次迭代中,程序计算了每个粒子的适应度,并更新个体最佳位置和全局最佳位置。 在每次迭代中,程序还进行了一些额外的计算,如潮流计算、储能约束等。这些计算可能涉及到电力系统的潮流计算、功率平衡等知识点。 最后,程序输出了一些结果,包括最佳位置和适应度等。同时,程序还绘制了一些图形,如电压和损耗的变化等。 综上所述,这段程序主要是一个改进的粒子群算法,用于解决电力

    三保一评关系与区别分析

    三保一评关系与区别分析

    Day-05 Vue22222222222

    Day-05 Vue22222222222

    多功能知识付费源码下载实现流量互导多渠道变现+搭建教程

    多功能知识付费源码下载实现流量互导多渠道变现+搭建教程。资源变现类产品的许多优势,并剔除了那些无关紧要的元素,使得本产品在运营和变现能力 方面实现了质的飞跃。多领域素材资源知识变现营销裂变独立版本。 支持:视频、音频、图文、文档、会员、社群、用户发布、创作分成、任务裂变、流量主、在线下载等多种功能,更多功能 正在不断更新中... 支持流量主变现模式,付费下载付费古观看等变现模式。 实现流量互导,多渠道变现。可以独立部署,并绑定自有独立域名,没有域名限制。

    住家保姆的工作职责、照顾老人住家保姆服务内容.docx

    住家保姆的工作职责、照顾老人住家保姆服务内容.docx

    《高温中暑事件卫生》一级(红色),二级(橙色),三级(黄色),四级(蓝色).docx

    《高温中暑事件卫生》一级(红色),二级(橙色),三级(黄色),四级(蓝色).docx

    职业中专技工学校专业评估表.docx

    职业中专技工学校专业评估表.docx

    统计计算使用R一书的源代码Rcode.zip

    统计计算使用R一书的源代码Rcode.zip

    YOLO算法-火灾和人员探测数据集-850张图像带标签-人-烟-火.zip

    YOLO系列算法目标检测数据集,包含标签,可以直接训练模型和验证测试,数据集已经划分好,包含数据集配置文件data.yaml,适用yolov5,yolov8,yolov9,yolov7,yolov10,yolo11算法; 包含两种标签格:yolo格式(txt文件)和voc格式(xml文件),分别保存在两个文件夹中,文件名末尾是部分类别名称; yolo格式:<class> <x_center> <y_center> <width> <height>, 其中: <class> 是目标的类别索引(从0开始)。 <x_center> 和 <y_center> 是目标框中心点的x和y坐标,这些坐标是相对于图像宽度和高度的比例值,范围在0到1之间。 <width> 和 <height> 是目标框的宽度和高度,也是相对于图像宽度和高度的比例值; 【注】可以下拉页面,在资源详情处查看标签具体内容;

    社区居民诊疗健康-JAVA-基于SpringBoot的社区居民诊疗健康管理系统设计与实现(毕业论文)

    社区居民诊疗健康功能描述 社区居民诊疗健康系统是一个为社区居民提供健康管理、疾病预防、诊疗服务和健康教育的综合平台。该平台致力于提升居民的健康水平,通过智能化、便捷化的服务为居民提供高效的健康保障。以下是该系统的主要功能描述: 1. 用户注册与登录 居民注册:居民可以通过身份证、手机号或社交媒体账号进行注册,填写个人基本信息(如姓名、性别、年龄、联系方式等)并创建账户。 健康档案管理:每个居民注册后,系统会自动生成个性化健康档案,记录个人的健康历史、疾病记录、体检报告等。 2. 健康档案与记录管理 个人健康档案:包括居民的基础健康信息、既往病史、用药记录、免疫接种记录、体检报告等。 诊疗记录管理:记录每次诊疗信息,如诊断、治疗方案、用药情况及随访记录。 健康指标监测:定期记录和更新如血压、血糖、体重、体脂等常见健康指标,便于长期追踪和分析。 3. 在线问诊与诊疗服务 在线咨询:居民可以通过平台预约或直接向社区医生发起在线问诊,获取健康咨询、疾病预防建议、用药指导等服务。 远程诊疗:提供视频问诊功能,方便居民与医生进行实时面对面的远程交流,获得更加详细的诊疗建议。 预约就诊:居民可以

    面部、耳廓损伤损伤程度分级表.docx

    面部、耳廓损伤损伤程度分级表.docx

    java毕设项目之ssm校园美食交流系统+vue(完整前后端+说明文档+mysql+lw).zip

    项目包含完整前后端源码和数据库文件 环境说明: 开发语言:Java 框架:ssm,mybatis JDK版本:JDK1.8 数据库:mysql 5.7 数据库工具:Navicat11 开发软件:eclipse/idea Maven包:Maven3.3 服务器:tomcat7

    功能完善的小说CMS系统项目全套技术资料.zip

    功能完善的小说CMS系统项目全套技术资料.zip

    YOLO算法-回收站数据集-501张图像带标签-黑色垃圾箱-绿色垃圾桶-箱子-杯子-老鼠-蓝色垃圾桶.zip

    YOLO系列算法目标检测数据集,包含标签,可以直接训练模型和验证测试,数据集已经划分好,包含数据集配置文件data.yaml,适用yolov5,yolov8,yolov9,yolov7,yolov10,yolo11算法; 包含两种标签格:yolo格式(txt文件)和voc格式(xml文件),分别保存在两个文件夹中,文件名末尾是部分类别名称; yolo格式:<class> <x_center> <y_center> <width> <height>, 其中: <class> 是目标的类别索引(从0开始)。 <x_center> 和 <y_center> 是目标框中心点的x和y坐标,这些坐标是相对于图像宽度和高度的比例值,范围在0到1之间。 <width> 和 <height> 是目标框的宽度和高度,也是相对于图像宽度和高度的比例值; 【注】可以下拉页面,在资源详情处查看标签具体内容;

    java毕设项目之ssm助学贷款+jsp(完整前后端+说明文档+mysql+lw).zip

    项目包含完整前后端源码和数据库文件 环境说明: 开发语言:Java 框架:ssm,mybatis JDK版本:JDK1.8 数据库:mysql 5.7 数据库工具:Navicat11 开发软件:eclipse/idea Maven包:Maven3.3 服务器:tomcat7

    (3127654)超级玛丽游戏源码下载

    内容来源于网络分享,如有侵权请联系我删除。另外如果没有积分的同学需要下载,请私信我。

    hw06.zip

    hw06

    基于 C++和TCP和WebSocket的即时通信系统设计与实现(源码+文档)

    这个项目是使用C++实现的即时通信系统,具有高性能、高并发的特点,项目包括客户端和服务器,实现了以下功能:注册、登录、点对点聊、群聊、上下线通知、用户在线信息、拉取好友信息、拉取好友分组信息、拉取群信息、拉取群成员信息;使用到的语言包括C++、Node.js;开源库:Boost C++ Libraries、Openssl、Protobuf、Hiredis、Socket.io;相关开发工具:Redis、Sqlite、Nginx、Microsoft Visual Studio、Visio;

    医疗设备管理-JAVA-基于springboot的医疗设备管理系统设计与实现(毕业论文)

    医疗设备管理功能描述 医疗设备管理系统的主要目的是为医院和医疗机构提供高效的设备管理解决方案,确保医疗设备的安全、有效和高效使用。以下是该系统可能具备的功能描述: 1. 设备信息管理 设备登记:记录所有医疗设备的基本信息,包括设备名称、型号、序列号、生产厂家、购置日期等。 设备分类:将设备按类型(如影像设备、监护设备、实验室设备等)进行分类,方便查询和管理。 设备状态跟踪:实时更新设备的使用状态(如在用、维修中、闲置等),确保信息准确。 2. 设备维护管理 维护计划:制定设备的定期维护计划,设置维护周期和提醒通知。 维护记录:记录每次设备维护的详细信息,包括维护日期、维护内容、维护人员等。 故障报告:提供故障报告功能,用户可以快速记录设备故障并提交给维护人员。 3. 设备使用管理 使用申请:医务人员可以在线申请使用特定设备,系统自动记录申请信息。 使用记录:记录设备的使用情况,包括使用时间、使用人员、使用目的等,便于后续查询。 使用统计:生成设备使用统计报表,分析设备的使用频率和效率。 4. 库存管理 库存监控:实时监控医疗设备的库存情况,确保设备充足。 设备采购管理:记录设备采购

Global site tag (gtag.js) - Google Analytics