`

Endeca Note

阅读更多

Comes form blog:http://www.ateam-oracle.com/notes-on-querying-endeca-from-within-an-atg-application/

Background

On a few projects in 2014, the issue of Endeca’s performance came up. Specifically, applications were seeing a large number of queries and were also generating large response sizes from Endeca. These queries were not being generated by the Assembler API, but were one-off queries created to bring back other information from Endeca.

This article will give some tips on how to optimize those queries.

Notes on Endeca Query Response objects

A response from Endeca can consist of a number of different pieces of data:

  • Records (aka, products)
  • Dimensions to navigate on
  • Breadcrumbs (also sometimes called Descriptors), which return information on which things have been navigated on
  • Supplemental objects: Used internally, these bring back meta-data about the rules being executed (such as landing pages, content slots, etc.)
  • Dimension search results: These are special queries that search only within dimensions, not products.
  • Key properties: Almost never used. Returns meta-data about the properties and dimensions in the index.
  • Other information about the index (such as the list of available sort keys, search fields, etc). You can’t really turn any of that off.

In an Assembler based application, the overall flow for a standard page being rendered would be this:

First, a super-lightweight query is executed that only returns Supplemental objects representing information from Experience Manager. Depending on that meta-data, one or more subsequent queries is executed that will return:

  1. 1. Dimensions
  2. 2. Records
  3. 3. Breadcrumbs
  4. 4. NOT supplemental objects (this is true for patched 10.2, 11.0 and 11.1)

Thus a standard basic page would generate about two Endeca queries to be rendered. There are some cartridges that generate more:

  • featured records cartridges generate one query per cartridge. So if you have three featured record cartridges, there would be three extra queries

In addition, the Assembler API is usually very good about only bringing back the exact information it needs. This means that it will only “open up” the dimensions being requested, and will return back the attributes on the products specified in the ResultsList configuration.

When it comes to making standalone queries to Endeca, you need to understand the information above so as to NOT bring back any more data than necessary.

Scanning for Standalone Endeca queries in code

The easiest way to scan an application for one-off Endeca queries is to do a search for “ENEQuery” or “ENEQueryResults” in you .java classes. In addition, searching for “InvokeAssembler” in .jsp’s. You can also search for UrlENEQuery.

If you find many instances of those, each and every query involved should be assessed using the information laid out below.

Improving your queries

Limiting which properties to return on a record

In a standard commerce application, it’s not uncommon for one record (representing a product or SKU) to have 80 or 100 or more properties. These can be things like product ID, UPC, short descriptions, size/color/widths, prices, image URLs, etc.

If you are not careful, it’s very easy to return all 80 or 100 or whatever property values with a standalone query.

To look at what comes back by default, you can look at the orange JSP reference application (typically located athttp://localhost:8006/endeca_jspref, with a host of localhost and port of 15000).

The list of properties to be returned can be controlled by using ENEQuery.setSelection(). This requires that you to specify every single property (and dimension) to be returned. It is case-sensitive.

Limit the number of records to return

By default, a query will return 10 records. To limit this, you can use ENEQuery.setNavNumERecs.

In the .reqlog, if you see &nbins=10, that means that someone didn’t set this value specifically and is probably using the default.

At the same time, you shouldn’t set this value to be too large. If you find yourself setting this to 50 or 100, you might be doing something wrong.

Omit Supplemental Objects

A supplemental object is the meta-data about a landing page or content slot. If you use the orange reference application, at the top you’ll see one or more “Unknown Merch Style”. Scrolling to the bottom of page, you’ll see a series of “Matching Supplemental Objects”.

What’s the big deal about these? Well, these can actually get somewhat large in size (for instance, if you have cartridges that allow merchandisers to copy/paste raw HTML). Also, the only real time they need to come back is when doing Assembler queries; not one-off queries.

There’s no flag for turning supplemental objects on/off. However, you can add a merch rule filter that will have the effect of turning them off. (This is what a hotfix for 10.2 and what 11.x do by default. If you look in the .reqlog, you’ll see&merchrulefilter=endeca.internal.nonexistent in some of the queries).

This can be turned on by using ENEQuery.setNavMerchRuleFilter(). Basically any nonsense string in here will have the correct effect. This would also be a good place to put a message in for logging purposes. Something likeENEQuery.setNavMerchRuleFilter(“topNavigationQuery”).

In the .reqlog, you should see &merchrulefilter .

Don’t expose all dimensions

If you look at the orange reference app, you’ll see that the dimensions on the left side are “closed” up. If you click one, the page will refresh and now that dimension will be “opened” up.

If you would like to open up all dimensions, you can use ENEQuery.setNavAllRefinements(true).

However, this can be potentially very expensive. With no dimensions being returned, the MDEX Engine doesn’t have to compute the refinement counts (aka, “How many records are there for Brand=XYZ”?) Also, this can inflate the response size greatly, especially for big flag dimensions.

Instead, you should specify which particular dimensions you want to return. Unfortunately, you need to specify the ID of the dimension, not the name.

If you know the ID of the dimensions you care about, you can use UrlENEQuery.setNe() and pass in a string like “123+234+532″.

Looking through the .reqlog, if you see &allgroups=1, that means somewhere someone has setNavAllRefinements(true).

Use record filters instead of keyword search

Let’s say you’re on a product details page. If you know the ID of the product, you have two choices: You can do a keyword search on the ID field passing in the string of the value. Or you can construct a record filter. A record filter is usually faster and cleaner. (There’s no reason to fill your logs with searches that customers didn’t type in).

ENEQuery.setNavRecordFilter() is the method. An example might be: query.setNavRecordFilter(“AND(product.id:2342342))”.

Use setQueryInfo for logging custom things to Endeca’s .reqlog files

A little-used feature is the ENEQuery.setQueryInfo() method. This lets you stuff any number of key/value pairs that get sent to the MDEX Engine, ignored, but written out to the .reqlog file. This can be useful for adding things like session ID, debug information, etc.

For our case, what might be good is to write out why this query is being executed. “pdpBreadcrumbs” “typeahead” , etc.

This way, if there are slow or big queries found during performance testing, it will help track them down and help distinguish between real Assembler queries and your one-off queries.

These messages will show up in the .reqlog as &log=

Don’t ever set setNavERecsPerAggrERec to 2

ENEQuery.setNavERecsPerAggrERec() allows you to specify how many records are returned per aggregate record. For example, say you are a clothing website. You probably index by SKU (which would represent a single Size/Color combination for a product). When doing query to Endeca, instead of returning info at a SKU level, you would aggregate things by a rollup key using ENEQuery.setNavRollupKey().

setNavERecsPerAggrERec() allows to you bring back 0, 1 or all SKUs within a product. You should do everything possible to NOT set it to the value of “2”, which is all.

(As a point of reference, ENEQuery has 3 static values representing those numbers. ZERO_ERECS_PER_AGGR, ONE_EREC_PER_AGGR, ALL_ERECS_PER_AGGR).

In the .reqlog, if you see &allbins=2, then that means someone setNavERecsPerAggrERec(ALL_ERECS_PER_AGGR).

Now, this might make things complicated for you. For instance, at Eddie Bauer on the search results page, they wanted to display the color swatches from each different SKU. By setting it to bring back all, they were able to iterate across all of the SKUs in the product to generate that list.

Instead, things were changed so that each SKU was tagged with information about all of the other SKUs. This allowed us to change this from 2 to 1. Response sizes went from 10 megs to 100kb.

Watch out for filtering based on timestamps

For some commerce sites, they might set products to activate during the day (“Starting at 1pm EST, this product should show up, but before that, it shouldn’t”).

One way to do this would be to tag all products with a start date and end date. And then with each query to Endeca, pass along a range filter for the dates.

The problem, however, can be that the MDEX Engine does some internal caching based on these values. If the date value you specify is too granular, then the MDEX won’t work as fast as it could. So don’t specify a timestamp down to the second or millisecond. Try and do timestamps for the hour, or at least chunks of minutes (like 20 or 30 minutes) to ensure that some cache hits occur.

Range filters can be set by using setNavRangeFilters().

In the .reqlog, you can look for &pred . A CRS example might look like: pred=product.endDate%7cGTEQ+1.4163552E12&pred=product.startDate%7cLTEQ+1.4163552E12

Don’t return key properties

This is a little-used feature, so it’s not something you’d come across very often. Key properties return meta-data about the definitions of properties and dimensions themselves. This can be turned on using ENEQuery.setNavKeyProperties(ENEQuery.KEY_PROPS_ALL).

This can greatly inflate the response size of a query from Endeca.

If you do need this for some reason, you should only need to execute the query once, and then cache the results from it.

This can be found in the .reqlog as &keyprops=all

Things that CRS does that aren’t optimal

Careful readers might notice that CRS breaks some of the rules above. In particular:

  • CRS filters based on timestamps
  • CRS used to do setNavERecsPerAggrERec = 2

What would the worst query in the world look like?

As an interesting point of reference, the world’s worst Endeca query would:

  • setNavAllRefinements(true)
  • not use .setSelection()
  • not use .setNavMerchRuleFilter()
  • uses setNavRollupKey()
  • does a wildcard keyword search
  • have a high number of search terms (in addition to the wildcard)
  • setNavNumERecs() to a large value
  • setNavKeyProperties(ENEQuery.KEY_PROPS_ALL)
  • sorts on something not frequently sorted on
  • uses pagination ( .setNavERecsOffset) to go to a high page number\
  • use a geospatial filter
  • uses a range filter ( .setNavRangeFilters())

What would the world’s fastest query look like?

  • no keyword search
  • setNavAllRefinements(false)
  • setNavNumERecs(0)
  • setNavMerchRuleFilter(“lksdkjfd”)
  • doesn’t touch setNavKeyProperties()
  • uses a setNavRecordFilter() for a record filter that had been previously used and basically filters everything out
分享到:
评论

相关推荐

    Endeca 术语

    ### Endeca 术语知识点 #### 一、Endeca概述 Endeca是一家专注于提供信息访问解决方案的公司,其核心产品Endeca Information Access Platform (IAP) 是一个强大的企业级搜索平台,能够帮助用户从大量非结构化数据...

    Oracle收购Endeca Technolgies.pdf

    Endeca Technologies作为一家专注于这些领域的技术提供商,其收购对于Oracle而言,是一次重要的战略扩展。 首先,让我们了解非结构化数据管理的重要性。非结构化数据是指没有预定义的数据模型的数据,常见的形式...

    Endeca介绍资料(比较全面的一份)

    Endeca是Oracle旗下的一个多维搜索引擎和分析平台,广泛应用于电子商务、企业信息搜索以及大数据分析等领域。该平台的核心特性包括其非关系型的搜索引擎、大数据处理能力、以及能够让用户自由探索数据的架构设计。 ...

    endeca:Oracle Endeca示例

    Oracle Endeca是一款强大的数据探索和导航工具,由Oracle公司提供,主要用于构建企业级的搜索、数据分析和信息发现解决方案。Endeca以其灵活性、可扩展性和高性能而著名,尤其适合处理非结构化和半结构化数据。在本...

    Endeca-RecordStore-Inspector

    Endeca RecordStore Inspector Endeca RecordStore Inspector 是一个 GUI 工具,用于可视化 Endeca RecordStores 的内容。 它的创建是为了帮助 Endeca 开发人员调试与 CAS 数据摄取有关的问题。 我在这里写了一篇...

    endeca:用于Endeca的Ruby适配器(使用JSON桥)。 允许您定义以非常像Ruby的方式从Endeca提取信息所需的映射和读取器

    恩德卡by Rein Henrichs and Andy Stone描述: 用于Ruby的Endeca客户端库。功能/问题: 简介: class Listing < Endeca xss=removed> 'R' map(:expand_refinements => :expand_all_dims).into(:M) float_reader \ :...

    toohey-ATG-Endeca:与MySQL + jboss一起快速设置Oracle Commerce(ATG + Endeca)

    快速安装Oracle Commerce(ATG + Endeca) 关于 这将使用通用默认值安装Oracle Commerce平台(ATG + Endeca)。 这是为了帮助更轻松,更一致地为项目设置开发人员环境。 这将创建一个无用的盒子,供您在团队内部轻松...

    文本挖掘技术工具

    Endeca 的组件包括 MDEX Engine、Endeca Content Acquisition System、Endeca Assembler 和 Endeca Experience Manager 等。这些组件可以帮助企业更好地挖掘和分析数据,从而提高业务决策的科学性和可靠性。 文本...

    oracle-commerce-gradle:Gradle插件来构建Oracle Commerce(ATG + Endeca)项目

    Gradle插件来构建Oracle Commerce(ATG + Endeca)项目 由Naga rajan Seshadri创建电子邮件 完整的例子 使用插件的ATG模块-示例 请参阅根文件夹中的build.gradle,settings.gradle和gradle.properies 请参阅所有...

    从无的放矢到个性化的知识探索北京交通大学图书馆.pptx

    Endeca DataFoundry、Navigation Engine和Presentation Server的基本架构展示了Endeca系统如何处理和呈现数据,以实现更高效的搜索和导航功能。 综上所述,这个文件讨论的核心知识点是: 1. 现代图书馆服务需要与...

    Learning EasyMock3.0 By Official Example

    3. `EndecaConceptsGuide.pdf`:Endeca 是一个数据管理平台,这个指南可能涉及到如何在 Endeca 环境下使用 EasyMock。虽然 Endeca 不是 EasyMock 的一部分,但了解如何在特定上下文中使用模拟对象是重要的实践技巧。...

    官方资料:借助Oracle EBS电子商务套件12.1实现管理价值.pdf

    E-Business Suite 12.1.3 [与 Endeca 集成]:提高效率和有效性:•集中式关键业务功能可支持共享服务 •客户、员工与供应商的自助式协作 •丰富的电子表格与影像集成。 满足全球要求:•统一的全球性平台•通用、...

    博通推出28nm异构知识型处理器 (1).pdf

    甲骨文与道安晋携手发布了一系列基于甲骨文云计算平台的客户体验产品,包括Right Now、Endeca、Fatwire、Inquire、ATG Livehelp等,以及跨国呼叫中心系统。这些产品旨在帮助中国企业利用跨国公司的成熟业务实践经验...

    商务套件EBS(EBusinessSuite)研发战略和路线图.pptx

    - EBS for Endeca的提及表明Oracle致力于将搜索和数据发现技术融入EBS,提升数据分析和洞察力。 5. **支持时间表**: Oracle提供了明确的支持时间表,保证对11.5.10和12.1版本的长期支持,让客户有信心进行长期...

    电子商务套件OracleEBS(E-BusinessSuite)供应链管.pptx

    Oracle EBS SCM的未来发展方向包括更深入的集成,如Endeca的扩展功能,提供内存中成本管理、到岸成本管理、最低成本公式等功能。此外,还强化了配料替换、电子批次记录、触摸屏用户界面等,以适应分布式和预混流程...

    ATG Linux installation

    ATG RMI(Remote Method Invocation)服务运行在6860端口,而Endeca服务位于172.16.102.11的6067端口,BCC(Business Control Center)在172.16.102.12的6068端口,CSC(Commerce Site Composer)在172.16.102.13的...

Global site tag (gtag.js) - Google Analytics