`

Understanding the Endeca CAS & EAC APIs

阅读更多

Introduction

I’ve always felt that the best way to understand something is to take it apart and try to put it back together. In this blog we’ll be doing that by deconstructing the Endeca application scripts and reconstructing them in Java, revealing their inner workings and familiarizing developers with the Endeca CAS, RecordStore, and EAC API’s. Beyond exploring these API’s, the solutions presented herein may be useful to Endeca application developers needing greater flexibility and control than that available by the default scripts, and those who prefer to work in Java over BeanShell and shell scripts.

Main Article

The Endeca CAS Server is a Jetty based servlet container that manages record stores, dimensions, and crawling operations. The CAS Server API is an interface for interacting with the CAS Server. By default, the CAS Service runs on port 8500. Similarly, the Endeca EAC Central Server runs on Tomcat, and coordinates the command, control, and monitoring of EAC applications. By default, it runs on port 8888. Each of these servers, and their respective APIs, are explained in the following Endeca documents:

Content Acquisition System Developer’s Guide
Content Acquisition System API Guide
Platform Services Application Controller Guide

We will use these APIs to re-write the scripts generated by the deployment template for the Discover Electronics reference application, using Java instead of shell script.

To begin, we need to generate the scripts that we’ll be converting. Detailed instructions for this procedure are provided in theCAS Quick Start Guide, but the basic syntax for deploying the Endeca Discover Electronics CAS application is:

cd \Endeca\ToolsAndFrameworks\11.1.0\deployment_template\bin
deploy --app C:\Endeca\ToolsAndFrameworks\11.1.0\reference\discover-data-cas\deploy.xml

Make sure to answer N when prompted to install a base deployment.

Once the deploy command has finished, you should see the following files included in the C:\Endeca\Apps\Discover\controldirectory:

initialize_services.bat      
load_baseline_test_data.bat  
baseline_update.bat          
promote_content.bat          

These are the scripts that we will be re-writing in Java. After running our Java application, we should be able to navigate to the following URLs and see the same results as having executed the above scripts:

http://localhost:8006/discover
http://localhost:8006/discover-authoring

initialize_services

The first script that we will begin analyzing is initialize_services. Opening the file in a text editor, we see that the first thing it does is set some environment variables. Rather than use system variables, it is customary for Java applications to read from property files, so we’ll create a config.properties file to store our configuration, and load it using the following syntax:

try {
    configProperties.load(ResourceHelper.class.getClassLoader().getResourceAsStream("config.properties"));
} catch (IOException e) {
    log.error("Cannot load configuration properties.", e);
}

Next, the script checks if the --force argument was specified. If it was, the script removes any existing crawl configuration, record stores, dimension value id managers, and lastly the application. The code below shows how to remove the crawl configuration, record stores, and dimval id managers:

public static CasCrawler getCasCrawler() throws IOException {
    String host = getConfigProperty("cas.host");
    int port = Integer.parseInt(getConfigProperty("cas.port"));
    CasCrawlerLocator locator = CasCrawlerLocator.create(host, port);
    locator.setPortSsl(Boolean.parseBoolean(getConfigProperty("cas.ssl")));
    locator.ping();
    return locator.getService();
}

public static ComponentInstanceManager getComponentInstanceManager() throws IOException {
    String host = getConfigProperty("cas.host");
    int port = Integer.parseInt(getConfigProperty("cas.port"));
    ComponentInstanceManagerLocator locator = ComponentInstanceManagerLocator.create(host, port);
    locator.setPortSsl(Boolean.parseBoolean(getConfigProperty("cas.ssl")));
    locator.ping();
    return locator.getService();
}

public static boolean deleteCrawl(String id) {
    try {
        getCasCrawler().deleteCrawl(new CrawlId(id));
    } catch (ItlException|IOException e) {
        log.error("Unable to delete crawl '"+id+"'", e);
    }
}

public static void deleteComponentInstance(String name) {
    try {
        getComponentInstanceManager().deleteComponentInstance(new ComponentInstanceId(name));
    } catch (ComponentManagerException|IOException e) {
        log.error("Unable to delete component instance '"+name+"'", e);
    }
}

However, removing the application is a bit more involved and requires interacting with the EAC, whose configuration is stored in AppConfig.xml:

<app appName="Discover" eacHost="jprantza01" eacPort="8888" 
    dataPrefix="Discover" sslEnabled="false" lockManager="LockManager">
  <working-dir>${ENDECA_PROJECT_DIR}</working-dir>
  <log-dir>./logs</log-dir>
</app>

So we need to load AppConfig.xml, which is a Spring-based ApplicationContext configuration file:

Resource appConfigResource = new FileSystemResource(getConfigProperty("app.config"));
if (!appConfigResource.exists()) {
    appConfigResource = new ClassPathResource(appConfig);
}
if (!appConfigResource.exists()) {
    log.error("Cannot load application configuration: "+appConfig);
} else {
    XmlBeanDefinitionReader xmlReader = new XmlBeanDefinitionReader(appContext);
    xmlReader.loadBeanDefinitions(appConfigResource);
    PropertyPlaceholderConfigurer propertySubstituter = new PropertyPlaceholderConfigurer();
    propertySubstituter.setIgnoreResourceNotFound(true);
    propertySubstituter.setIgnoreUnresolvablePlaceholders(true);
    appContext.addBeanFactoryPostProcessor(propertySubstituter);
    appContext.refresh();
}

Note that the propertySubstituter (PropertyPlaceholderConfigurer) is necessary to allow for expansion of properties like${ENDECA_PROJECT_DIR}. These properties must exist in your environment.

Once the appContext has been loaded, we can remove an app by retrieving all beans of type Component orCustomComponent and removing their definitions with:

public static void removeApp(String appName) {
    try {
        Collection<Component> components = getAppContext().getBeansOfType(Component.class).values();
        if (components.size() > 0) {
            Application app = toApplication(components.iterator().next());
            if (app.isDefined() && app.getAppName().equals(appName)) {
                Collection<CustomComponent> customComponents = getAppContext().getBeansOfType(CustomComponent.class).values();
                for (CustomComponent customComponent: customComponents) {
                    try {
                        customComponent.removeDefinition();
                    } catch (EacComponentControlException e) {
                        log.error("Unable to remove definition for "+customComponent.getElementId(), e);
                    }
                }
                app.removeDefinition();
            }
            else {
                log.warn("Application '"+appName+"' is not defined.");
            }
        }
    }
    catch (AppConfigurationException|EacCommunicationException|EacProvisioningException e) {
        log.error("Unable to remove application '"+appName+"'", e);
    }
}

Provided that the app state is clean, the script then goes on to create the record stores, create the dimension value id managers, and set the configuration on the data record store, which can be accomplished using the following code:

public static void createComponentInstance(String type, String name) {
    try {
        getComponentInstanceManager().createComponentInstance(new ComponentTypeId(type), new ComponentInstanceId(name));
    } catch (ComponentManagerException|IOException e) {
        log.error("Unable to create "+typeId+" instance '"+name+"'", e);
    }
}

public static void setConfiguration(RecordStore recordStore, File configFile) {
    try {
        recordStore.setConfiguration(RecordStoreConfiguration.load(configFile));
    } catch (RecordStoreConfigurationException e) {
        StringBuilder errorText = new StringBuilder();
        for (RecordStoreConfigurationError error: e.getFaultInfo().getErrors()) {
            errorText.append(error.getErrorMessage()).append("\n");
        }
        log.error("Invalid RecordStore configuration:\n"+errorText);
    } catch (RecordStoreException e) {
        log.error("Unable to set RecordStore configuration", e);
    }
}

It then calls out to the following BeanShell script, found in InitialSetup.xml:

<script id="InitialSetup">
  <bean-shell-script>
    <![CDATA[ 
  IFCR.provisionSite();
  CAS.importDimensionValueIdMappings("Discover-dimension-value-id-manager", 
      	InitialSetup.getWorkingDir() + "/test_data/initial_dval_id_mappings.csv");
    ]]>
  </bean-shell-script>
</script>

Now, if we wanted to convert these scripts to Java as well, we could do the following:

IFCRComponent ifcr = (IFCRComponent) getAppContext().getBean("IFCR", IFCRComponent.class);
ifcr.provisionSite();
...

But to keep this exercise simple, I chose not to convert the BeanShell scripts, and rather to leave it as an exercise for the reader. All that the BeanShell scripts do is bind to Spring Beans that are defined elsewhere in the configuration, and call their Java methods. For example, the IFCR component is defined in WorkbenchConfig.xml.

Instead, to execute the BeanShell scripts, you can use the convenience method invokeBeanMethod():

try {
    invokeBeanMethod("InitialSetup", "run");
} catch (IllegalAccessException|InvocationTargetException e) {
    log.warn("Failed to configure EAC application. Services not initialized properly.", e);
    releaseManagedLocks();
}

After the initial setup is complete, we can create the crawl configuration using the following code:

public static void createCrawl(CrawlConfig config) {
    try {
        List<ConfigurationMessage> messages = getCasCrawler().createCrawl(config);
        StringBuilder messageText = new StringBuilder();
        for (ConfigurationMessage message: messages) {
            messageText.append(message.getMessage()).append("\n");
        }
        log.info(messageText.toString());
    }
    catch (CrawlAlreadyExistsException e) {
        log.error("Crawl unsuccessful. A crawl with id '"+config.getCrawlId()+"' already exists.");
    }
    catch (InvalidCrawlConfigException|IOException e) {
        log.error("Unable to create crawl "+config.getCrawlId(), e);
    }
}

Finally, to import the content we can use either invokeBeanMethod() to call methods on the IFCR component, or look up theIFCRComponent using getBean() and call the import methods on it directly.

load_baseline_test_data

The next script, load_baseline_test_data, is responsible for loading the test data into the record stores. The two record stores that need to be populated are: Discover-data, and Discover-dimvals. These record stores are populated using the data from the following files:

Discover-data C:/Endeca/Apps/Discover/test_data/baseline/rs_baseline_data.xml.gz
Discover-dimvals C:/Endeca/Apps/Discover/test_data/baseline/rs_baseline_dimvals.xml.gz

To do this, we’ll first need to create or locate the record stores:

public static RecordStore getRecordStore(final String instanceName) throws IOException {
    String host = getConfigProperty("cas.host");
    int port = Integer.parseInt(getConfigProperty("cas.port"));
    RecordStoreLocator locator = RecordStoreLocator.create(host, port, instanceName);
    locator.ping();
    return locator.getService();
}

Then, the following code can be used to load the data:

public boolean loadData(final String recordStoreName, final String dataFileName, final boolean isBaseline) {
    File dataFile = new File(dataFileName);
    if (!dataFile.exists() || !dataFile.isFile()) { // verify file exists
        log.error("Invalid data file: " + dataFile);
        return false; // failure
    }
    TransactionId txId = null;
    RecordReader reader = null;
    RecordStoreWriter writer = null;
    RecordStore recordStore = null;
    int numRecordsWritten = 0;
    try {
        recordStore = getRecordStore(recordStoreName);
        txId = recordStore.startTransaction(TransactionType.READ_WRITE);
        reader = RecordIOFactory.createRecordReader(dataFile);
        writer = RecordStoreWriter.createWriter(recordStore, txId, 500);
        if (isBaseline) {
            writer.deleteAll();
        }
        for (; reader.hasNext(); numRecordsWritten++) {
            writer.write(reader.next());
        }
        close(writer); // must close before commit
        recordStore.commitTransaction(txId);
        log.info(numRecordsWritten + " records written.");
    }
    catch (IOException|RecordStoreException e) {
        log.error("Unable to update RecordStore '"+recordStoreName+"'", e);
        rollbackTransaction(recordStore, txId);
        return false; // failure
    }
    finally {
        close(reader);
        close(writer);
    }
    return true; // success
}

This code will open the record store for write access, remove all existing records, iterate through all records in the data file, and write them to the record store. Then either commit or roll back the transaction, and close any resources. This is called once for each record store. That’s all that the load_baseline_test_data script does.

baseline_update & promote_content

The last two scripts, baseline_update and promote_content, simply call out to the BeanShell scripts ‘BaselineUpdate’ and ‘PromoteAuthoringToLive’, which reside in DataIngest.xml, and WorkbenchConfig.xml respectively. BaselineUpdate will run the crawl, update and distribute the indexes. PromoteAuthoringToLive will export the configurations to the LiveDgraphCluster, and update the assemblers on LiveAppServerCluster. Both of these BeanShell scripts can be called by using eitherinvokeBeanMethod() or getBean().

Source Code

Attached below are a set of Java files that execute the same behavior as the application scripts, using the methods outlined above. The class files reflect the scripts they are modeled after:

Script Java Class
initialize_services com.oracle.ateam.endeca.example.itl.Initializer
load_baseline_test_data com.oracle.ateam.endeca.example.itl.Loader
baseline_update com.oracle.ateam.endeca.example.itl.Updater
promote_content com.oracle.ateam.endeca.example.itl.Promoter

 
You can run each Java class individually, or you can run everything all at once by usingcom.oracle.ateam.endeca.example.itl.Driver. Included in the distribution are build scripts, run scripts, and sample configuration files. If you have Endeca installed in a directory other than the default, then you may need to modify some files slightly.

Hopefully this exercise has helped eliminate some of the mystery behind what these scripts actually do. Feel free to modify the code as you need, but keep in mind that new product releases may modify the deployment templates, so keep an eye out for changes if you decide to incorporate this code into your solutions.

The attached source code requires Gradle, Maven, and Java 7 SDK to build. Once extracted, edit “scripts/mvn_install.bat” to point to your Endeca installation directory. Then run the script to install the dependent libraries into a local Maven repository. Finally, run “gradlew build” to build “discover_data_cas_java-1.0.jar”, and “gradlew javadoc” to build the javadocs.

分享到:
评论

相关推荐

    Endeca 术语

    ### Endeca 术语知识点 #### 一、Endeca概述 Endeca是一家专注于提供信息访问解决方案的公司,其核心产品Endeca Information Access Platform (IAP) 是一个强大的企业级搜索平台,能够帮助用户从大量非结构化数据...

    Oracle收购Endeca Technolgies.pdf

    Endeca Technologies作为一家专注于这些领域的技术提供商,其收购对于Oracle而言,是一次重要的战略扩展。 首先,让我们了解非结构化数据管理的重要性。非结构化数据是指没有预定义的数据模型的数据,常见的形式...

    Endeca介绍资料(比较全面的一份)

    Endeca是Oracle旗下的一个多维搜索引擎和分析平台,广泛应用于电子商务、企业信息搜索以及大数据分析等领域。该平台的核心特性包括其非关系型的搜索引擎、大数据处理能力、以及能够让用户自由探索数据的架构设计。 ...

    endeca:Oracle Endeca示例

    Oracle Endeca是一款强大的数据探索和导航工具,由Oracle公司提供,主要用于构建企业级的搜索、数据分析和信息发现解决方案。Endeca以其灵活性、可扩展性和高性能而著名,尤其适合处理非结构化和半结构化数据。在本...

    Endeca-RecordStore-Inspector

    它的创建是为了帮助 Endeca 开发人员调试与 CAS 数据摄取有关的问题。 我在这里写了一篇博客文章,解释了有关该工具的更多信息: 构建项目 该项目目前需要 Java 8 和 Endeca 11.1 才能成功构建。 尚未使用早期...

    endeca:用于Endeca的Ruby适配器(使用JSON桥)。 允许您定义以非常像Ruby的方式从Endeca提取信息所需的映射和读取器

    恩德卡by Rein Henrichs and Andy Stone描述: 用于Ruby的Endeca客户端库。功能/问题: 简介: class Listing &lt; Endeca xss=removed&gt; 'R' map(:expand_refinements =&gt; :expand_all_dims).into(:M) float_reader \ :...

    toohey-ATG-Endeca:与MySQL + jboss一起快速设置Oracle Commerce(ATG + Endeca)

    快速安装Oracle Commerce(ATG + Endeca) 关于 这将使用通用默认值安装Oracle Commerce平台(ATG + Endeca)。 这是为了帮助更轻松,更一致地为项目设置开发人员环境。 这将创建一个无用的盒子,供您在团队内部轻松...

    文本挖掘技术工具

    Endeca 的组件包括 MDEX Engine、Endeca Content Acquisition System、Endeca Assembler 和 Endeca Experience Manager 等。这些组件可以帮助企业更好地挖掘和分析数据,从而提高业务决策的科学性和可靠性。 文本...

    oracle-commerce-gradle:Gradle插件来构建Oracle Commerce(ATG + Endeca)项目

    Gradle插件来构建Oracle Commerce(ATG + Endeca)项目 由Naga rajan Seshadri创建电子邮件 完整的例子 使用插件的ATG模块-示例 请参阅根文件夹中的build.gradle,settings.gradle和gradle.properies 请参阅所有...

    从无的放矢到个性化的知识探索北京交通大学图书馆.pptx

    Endeca DataFoundry、Navigation Engine和Presentation Server的基本架构展示了Endeca系统如何处理和呈现数据,以实现更高效的搜索和导航功能。 综上所述,这个文件讨论的核心知识点是: 1. 现代图书馆服务需要与...

    Learning EasyMock3.0 By Official Example

    3. `EndecaConceptsGuide.pdf`:Endeca 是一个数据管理平台,这个指南可能涉及到如何在 Endeca 环境下使用 EasyMock。虽然 Endeca 不是 EasyMock 的一部分,但了解如何在特定上下文中使用模拟对象是重要的实践技巧。...

    官方资料:借助Oracle EBS电子商务套件12.1实现管理价值.pdf

    E-Business Suite 12.1.3 [与 Endeca 集成]:提高效率和有效性:•集中式关键业务功能可支持共享服务 •客户、员工与供应商的自助式协作 •丰富的电子表格与影像集成。 满足全球要求:•统一的全球性平台•通用、...

    SOA 信息系统企业架构

    Forrester Research 的研究报告《The Forrester Wave™: Information-as-a-Service, Q1 2008》对 IaaS 市场进行了深入分析,评价了 BEA Systems、IBM、Oracle 和 Red Hat 等领先厂商的表现,并指出它们在 IaaS 领域...

    博通推出28nm异构知识型处理器 (1).pdf

    甲骨文与道安晋携手发布了一系列基于甲骨文云计算平台的客户体验产品,包括Right Now、Endeca、Fatwire、Inquire、ATG Livehelp等,以及跨国呼叫中心系统。这些产品旨在帮助中国企业利用跨国公司的成熟业务实践经验...

    商务套件EBS(EBusinessSuite)研发战略和路线图.pptx

    - EBS for Endeca的提及表明Oracle致力于将搜索和数据发现技术融入EBS,提升数据分析和洞察力。 5. **支持时间表**: Oracle提供了明确的支持时间表,保证对11.5.10和12.1版本的长期支持,让客户有信心进行长期...

    电子商务套件OracleEBS(E-BusinessSuite)供应链管.pptx

    Oracle EBS SCM的未来发展方向包括更深入的集成,如Endeca的扩展功能,提供内存中成本管理、到岸成本管理、最低成本公式等功能。此外,还强化了配料替换、电子批次记录、触摸屏用户界面等,以适应分布式和预混流程...

    ATG Linux installation

    ATG RMI(Remote Method Invocation)服务运行在6860端口,而Endeca服务位于172.16.102.11的6067端口,BCC(Business Control Center)在172.16.102.12的6068端口,CSC(Commerce Site Composer)在172.16.102.13的...

Global site tag (gtag.js) - Google Analytics