Handle 1 Billion Events Per Day Using a Memory Gri

masterkey

浏览: 339915 次
性别:
来自: 北京

最近访客更多访客>>

cutesunshineriver

春春哥哥

hejianhua66

jobury

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

技术

performance SQL Server Cache PHP UP

Moshe Kaplan of RockeTier shows the life cycle of an affiliate marketing system that starts off as a cub handling one million events per day and ends up a lion handling 200 million to even one billion events per day. The resulting system uses ten commodity servers at a cost of $35,000.

Mr. Kaplan's paper is especially interesting because it documents a system architecture evolution we may see a lot more of in the future: database centric --> cache centric --> memory grid .

As scaling and performance requirements for complicated operations increase, leaving the entire system in memory starts to make a great deal of sense. Why use cache at all? Why shouldn't your system be all in memory from the start?

General Approach to Evolving the System to Scale

Analyze the system architecture and the main business processes. Detect the main hardware bottlenecks and the related business process causing them. Focus efforts on points of greatest return.

Rate the bottlenecks by importance and provide immediate and practical recommendation to improve performance.

Implement the recommendations to provide immediate relief to problems. Risk is reduced by avoiding a full rewrite and spending a fortune on more resources.

Plan a road map for meeting next generation solutions.

Scale up and scale out when redesign is necessary.

One Million Event Per Day System

The events are common advertising system operations like: ad impressions, clicks, and sales.

Typical two tier system. Impressions and banner sales are written directly to the database.

The immediate goal was to process 2.5 million events per day so something needed to be done.

2.5 Million Event Per Day System

PerfMon was used to check web server and DB performance counters. CPU usage was at 100% at peak usage.

Immediate fixes included: tuning SQL queries, implementing stored procedures, using a PHP compiler, removing include files and fixing other programming errors.

The changes successfully double the performance of the system within 3 months. The next goal was to handle 20 million events per day.

20 Million Event Per Day System

To make this scaling leap a rethinking of how the system worked was in order.

The main load of the system was validating inputs in order to prevent forgery.

A cache was maintained in the application servers to cut unnecessary database access. The result was 50% reduction in CPU utilization.

An in-memory database was used to accumulate transactions over time (impression counting, clicks, sales recording).

A periodic process was used to write transactions from the in-memory database to the database server.

This architecture could handle 20 million events using existing hardware.

Business projections required a system that could handle 200 million events.

200 Million Event Per Day System

The next architectural evolution was to a scale out grid product. It's not mentioned in the paper but I think GigaSpaces was used.

A Layer 7 load balancer is used to route requests to sharded application servers. Each app server supports a different set of banners.

Data is still stored in the database as the data is used for statistics, reports, billing, fraud detection and so on.

Latency was slashed because logic was separated out of the HTTP request/response loop into a separate process and database persistence is done offline.

At this point architecture supports near-linear scaling and it's projected that it can easily scale to a billion events per day.

分享到：

February 2009 Web Server Survey | Scaling Digg and Other Web Applications

2009-02-17 10:41
浏览 1058
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

相关推荐

Computation of 2700 billion decimal digits of Pi using a Desktop Computer: S(a,b)=\sum_{n=a}^{b-1} \frac{a_n}{\prod_{k=a}^{n} g(k)} \cdot \prod_{k=a}^{n} q(k) \] 其中， - $G(a,b)=\prod_{k=a}^{b-1} g(k)$ - $Q(a,b)=\prod_{k=a}^{b-1} q(k)$ - $P(a,b)$使得\(S(a,b)=\frac{P...

Trinity: A Distributed Graph Engine on a Memory Cloud: Computations performed by graph algorithms are data driven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot provide the level of ...

1-billion-word-language-modeling-benchmark-r13output: 1 Billion Word Language Model Benchmark. The purpose of the project is to make available a standard training and test setup for language modeling experiments.

1-billion-word-language-modeling-benchmark-r13output-part2: 1 Billion Word Language Model Benchmark. The purpose of the project is to make available a standard training and test setup for language modeling experiments. PART 2

Software Engineering at Google: Fergus Henderson has been a software engineer at Google for over 10 years. He started programming as a kid in 1979, and went on to academic research in programming language ...billion times per day.

Billion Chords和弦查询工具（含注册机）: Billion Chords和弦查询工具（含注册机），软件我已经注册好了！

HSDC of Palmchat, product of Afmobigroup: There are more than 10 billion transactions per day and more than ten million activity users per month, as a large social network app with more than hundred million subscribers in Africa, how can ...

A Few Billion Lines of Code Later - Using Static Analysis to Find Bugs in the Real World - ACM - 2010 (BLOC-coverity)-计算机科学: 2contributed articlesIn 2 002, COVErITY commercialized3 a research static bug-finding tool.6,9 Not surprisingly, as academics, our view of commercial realities was not perfectly accurate. However,...

Pro Spark Streaming,The Zen of Real-time Analytics using Apache Spark: One million Uber rides are booked every day, 10 billion hours of Netflix videos are watched every month, and $1 trillion are spent on e-commerce web sites every year. The success of these services is ...

handlenet-baseline：“ GraspNet-1Billion：通用对象抓取的大规模基准”的基准模型（CVPR 2020）: GraspNet基准“ GraspNet-1Billion：通用对象捕获的大规模基准”的基准模型（CVPR 2020）。 [ ] [] [ ] [ ] 我们的基准模型检测到的前50个抓地力。要求Python 3 PyTorch 1.6 Open3d的0.8 TensorBoard 2.3 NumPy 科学...

Cavity enhanced absorption spectroscopy for N_(2)O detection at 2.86 μm using a continuous tunable color center laser: A minimum detectable absorption coefficient of 2 * 10^(-6) cm^(-1) is achieved with a mirror reflectivity of 99.24%, corresponding to a N_(2)O detection limit of 600 parts per billion.

Billion Chords: 查看吉他和弦。此软件可以使用话筒对木吉他调弦，方便准确。

Big.Data.Open.Data.and.Data.Development.184821880X: We spend more than 4.8 hours a day on the Internet using a computer, and 2.1 hours using a mobile. Data, this new ethereal manna from heaven, is produced in real time. It comes in a continuous stream...

Flipkart-Big-Billion-Day-parser: 【标题】"Flipkart-Big-Billion-Day-parser"是一个针对印度知名电商平台Flipkart在其年度大促活动"Big Billion Day"期间推出的脚本。这个脚本的主要功能是监控和收集关于各种销售和优惠信息，帮助用户及时获取促销...

ISSCC2017-03: IBM presents the POWER9 processor, a 24-core chip implemented using 14nm SOI FinFET technology. This processor features 48 lanes of PCIe Gen 4 and 48 lanes of 25Gb/s links, making it highly optimized ...

和弦字典billion chords: 查和弦的软件，和使用,作为一个只想下载点东西的人来说，这个几波网站我真的服了，我只能说我去年买了个包，超耐磨的。

Billion-scale similarity search with GPUs: ### Billion-scale相似性搜索与GPU应用 #### 引言随着大数据时代的到来，图像和视频等非结构化数据成为信息检索的重要来源。然而，对于这些数据的处理与搜索，传统数据库系统往往力不从心。高维特征表示的图像或...

1 Billion Word Language Model Benchmark R13 Output 基准语料库.7z: 1 Billion Word Language Model Benchmark R13 Output 是一套新的基准语料库，被用于衡量和统计语言建模进展，凭借近 10 亿字的培训数据，该基准测试可以快速评估新的语言建模技术，并将其与其他新技术相结合。...

Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba.pdf: 根据提供的文件信息，以下是关于“Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba”这篇文章的知识点梳理。标题：“Billion-scale Commodity Embedding for E-commerce ...

intel flash datasheet: The J3 series uses the same NOR-based ETOX™ technology as Intel's one-bit-per-cell products, capitalizing on over one billion units of flash manufacturing experience since 1987. As a result, J3 ...

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论