eBay presented a keynote at Hadoop World, describing the architecture of its completely rebuilt search engine, Cassini, slated to go live in 2012. It indexes all the content and user metadata to produce better rankings and refreshes indexes hourly. It is built using Apache Hadoop for hourly index updates and Apache HBase to provide random access to item information. Hugh E. Williams the VP Search, Experience & Platforms for eBay Marketplaces delivered the keynote, where he outlined the scale, technologies used, and experiences from an 18 month effort by over 100 engineers to completely rebuild eBay's core site search. The new platform, Cassini, will support:
- 97 million active buyers & sellers
- 250 million queries per day
- 200 million items live in over 50,000 categories
eBay already stores 9 PB of data in Hadoop and Teradata clusters for analysis, but this will be their first production application that users use directly. The new system will be more extensive than the current one (Galileo):
10's of factors used for ranking | 100's of factors used for ranking |
title-only match by default | use all data to match by default |
manual intervention for rollout, monitoring, remediation | automated rollout, monitoring, remediation |
Cassini will keep 90 days of historical data online - currently 1 billion items, and include user and behavioral data for ranking. Most of the work required to support the search system is done in hourly batch jobs that run in Hadoop. Different kinds of indexes will all be generated in the same cluster (an improvement over Galileo, which had different clusters for each kind of indexing). The Hadoop environment allows eBay to restore or reclassify the entire site inventory as improvements are created.
Items are stored in HBase, and are normally scanned during the hourly index updates. When a new item is listed, it will be looked up in HBase and added to the live index within minutes. HBase also allows for bulk and incremental item writes and fast item reads and writes for item annotation.
Williams indicated that the team was familiar with running Hadoop and it had worked reliably with few problems. By contrast he indicated the "ride so far with HBase has been bumpy." Williams noted that eBay remains committed to the technology, have been contributing fixes to issues they found, are learning fast and that the last two weeks have gone smoothly. The engineering team was new to using HBase and ran into some issues when testing at scale, such as:
* production cluster configuration for their workloads
* hardware issues
* stability: unstable region servers, unstable master, regions stuck in transition
* monitoring HBase health: often problems haven't been detected until they impact live service - the team is adding lots of monitoring
* managing multi-step MapReduce jobs
Overall Williams felt the project was ambitious but had gone quickly and well, and that the team was able to use Hadoop and HBase to build a significantly improved search experience.
come from info
相关推荐
In his 2017 TED Talk, Strayer explains that constant engagement with technology—such as responding to emails, consuming news, and using social media—places significant stress on the prefrontal ...
RedisBloom v2.2.18 是一个专门为 Redis 数据库设计的布隆过滤器扩展模块,它提供了高效的数据去重和存在性检测功能。在理解这个版本之前,我们需要先了解 Redis 和布隆过滤器的基本概念。 Redis 是一个高性能的...
麻烦 简而言之 What `nmess` does for you: lays out an Express Node.js server scaffolds server routing prepares a database connection sets up a gulpfile that: compiles ECMAScript 6 to 5... readies We
C2000,28335Matlab Simulink代码生成技术,处理器在环,里面有电力电子常用的GPIO,PWM,ADC,DMA,定时器中断等各种电力电子工程师常用的模块儿,只需要有想法剩下的全部自动代码生成, 电源建模仿真与控制原理 (1)数字电源的功率模块建模 (2)数字电源的环路补偿器建模 (3)数字电源的仿真和分析 (4)如何把数学控制方程变成硬件C代码; (重点你的想法如何实现)这是重点数字电源硬件资源、软件设计、上机实验调试 (1) DSP硬件资源; (2)DSP的CMD文件与数据的Q格式: (3) DSP的C程序设计; (4)数字电源的软件设计流程 (5)数字电源上机实验和调试(代码采用全中文注释)还有这个,下面来看看都有啥,有视频和对应资料(S代码,对应课件详细讲述传递函数推倒过程。
OpenArk64-1.3.8beta版-20250104,beta版解决Windows 11 23H2及以上进入内核模式,查看系统热键一片空白的情况
java面向对象程序设计实验报告
基于springboot的校园台球厅人员与设备管理系统--论文.zip
【创新无忧】基于matlab蜣螂算法DBO优化极限学习机KELM故障诊断【含Matlab源码 10720期】.zip
基于springboot的数码论坛系统设计与实现--论文.zip
基于springboot的生鲜超市管理的设计与实现.zip
内容概要:本文针对污水再生全流程中首端处理单元——AO除磷工艺展开了详尽研究。首先介绍了当前国内水资源现状以及传统污水处理面临的挑战。基于这些挑战,研究人员提出了将A/O除磷与厌氧氨氧化相结合的新思路,并详细讨论了如何通过调控运行参数(如好氧段DO浓度、污泥负荷率等)来提升TP和COD的去除效果。文章强调在不牺牲氨氮浓度的前提下实现了高效低成本的除磷及有机物去除。同时利用DGGE技术探究了系统内的微生物群落结构,验证氨氧化细菌和亚硝化细菌在短泥龄条件下被淘汰的情况。 适合人群:从事污水处理技术研究的专业人士或对生物处理工艺感兴趣的环保工程师、科研人员。 使用场景及目标:①为改善传统污水处理工艺中存在的同步脱氮除磷难题提供解决方案;②探讨A/O除磷单元与其他处理单元组合时的设计考量和性能评估方法。 其他说明:本研究不仅有助于深入了解AO工艺背后的科学原理和技术难点,也为后续自养脱氮环节准备了合适的进水条件,促进了整个城市污水处理链条的技术进步和发展方向探索。
返岗证明模板.docx
arcgis矢量shp格式白城市地图
航天新征程航天发展历程介绍弘扬载人航天精神ppt
Yufeng-lidar
资源描述: HTML5实现好看的律师法律服务网站模板,好看的律师法律服务网站模板源码,律师法律服务网站模板,HTML律师法律服务网站模板源码,内置酷炫的动画,界面干净整洁,页面主题,全方位介绍内容,可以拆分多个想要的页面,可以扩展自己想要的,注释完整,代码规范,各种风格都有,代码上手简单,代码独立,可以直接运行使用。也可直接预览效果。 资源使用: 点击 index.html 直接查看效果
【创新无忧】基于matlab哈里斯鹰算法HHO优化极限学习机KELM故障诊断【含Matlab源码 10697期】.zip
【C#】基于C#的消息队列服务产品中间件
【创新无忧】基于matlab布谷鸟算法CS优化极限学习机KELM故障诊断【含Matlab源码 10691期】.zip
直连设备(单片机)端token自动计算(micropython)