Moshe Kaplan of RockeTier shows the life cycle of an affiliate marketing
system that starts off as a cub handling one million events per day and ends up
a lion handling 200 million to even one billion events per day. The resulting
system uses ten commodity servers at a cost of $35,000.
Mr. Kaplan's paper is especially interesting because it documents a system
architecture evolution we may see a lot more of in the future: database
centric --> cache centric --> memory grid
.
As scaling and performance requirements for complicated operations increase,
leaving the entire system in memory starts to make a great deal of sense. Why
use cache at all? Why shouldn't your system be all in memory from the start?
General Approach to Evolving the System to Scale
Analyze the system architecture and the main business processes. Detect the
main hardware bottlenecks and the related business process causing them. Focus
efforts on points of greatest return.
Rate the bottlenecks by importance and provide immediate and practical
recommendation to improve performance.
Implement the recommendations to provide immediate relief to problems. Risk
is reduced by avoiding a full rewrite and spending a fortune on more
resources.
Plan a road map for meeting next generation solutions.
Scale up and scale out when redesign is necessary.
One Million Event
Per Day System
The events are common advertising system operations like: ad impressions,
clicks, and sales.
Typical two tier system. Impressions and banner sales are written directly
to the database.
The immediate goal was to process 2.5 million events per day so something
needed to be done.
2.5 Million Event Per Day System
PerfMon was used to check web server and DB performance counters. CPU usage
was at 100% at peak usage.
Immediate fixes included: tuning SQL queries, implementing stored
procedures, using a PHP
compiler, removing include files and fixing other programming errors.
The changes successfully double the performance of the system within 3
months. The next goal was to handle 20 million events per day.
20 Million Event Per Day System
To make this scaling leap a rethinking of how the system worked was in
order.
The main load of the system was validating inputs in order to prevent
forgery.
A cache was maintained in the application servers to cut unnecessary
database access. The result was 50% reduction in CPU utilization.
An in-memory database was used to accumulate transactions over time
(impression counting, clicks, sales recording).
A periodic process was used to write transactions from the in-memory
database to the database server.
This architecture could handle 20 million events using existing
hardware.
Business projections required a system that could handle 200 million events.
200 Million Event Per Day System
The next architectural evolution was to a scale out grid product. It's not
mentioned in the paper but I think GigaSpaces
was used.
A Layer 7 load balancer is used to route requests to sharded application
servers. Each app server supports a different set of banners.
Data is still stored in the database as the data is used for statistics,
reports, billing, fraud detection and so on.
Latency
was slashed because logic was separated out of the HTTP request/response loop
into a separate process and database persistence is done offline.
At this point architecture supports near-linear scaling and it's projected
that it can easily scale to a billion events per day.
分享到:
相关推荐
S(a,b)=\sum_{n=a}^{b-1} \frac{a_n}{\prod_{k=a}^{n} g(k)} \cdot \prod_{k=a}^{n} q(k) \] 其中, - \(G(a,b)=\prod_{k=a}^{b-1} g(k)\) - \(Q(a,b)=\prod_{k=a}^{b-1} q(k)\) - \(P(a,b)\)使得\(S(a,b)=\frac{P...
Computations performed by graph algorithms are data driven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot provide the level of ...
1 Billion Word Language Model Benchmark. The purpose of the project is to make available a standard training and test setup for language modeling experiments.
1 Billion Word Language Model Benchmark. The purpose of the project is to make available a standard training and test setup for language modeling experiments. PART 2
Fergus Henderson has been a software engineer at Google for over 10 years. He started programming as a kid in 1979, and went on to academic research in programming language ...billion times per day.
Billion Chords和弦查询工具(含注册机),软件我已经注册好了!
There are more than 10 billion transactions per day and more than ten million activity users per month, as a large social network app with more than hundred million subscribers in Africa, how can ...
2contributed articlesIn 2 002, COVErITY commercialized3 a research static bug-finding tool.6,9 Not surprisingly, as academics, our view of commercial realities was not perfectly accurate. However,...
One million Uber rides are booked every day, 10 billion hours of Netflix videos are watched every month, and $1 trillion are spent on e-commerce web sites every year. The success of these services is ...
A minimum detectable absorption coefficient of 2 * 10^(-6) cm^(-1) is achieved with a mirror reflectivity of 99.24%, corresponding to a N_(2)O detection limit of 600 parts per billion.
查看吉他和弦。此软件可以使用话筒对木吉他调弦,方便准确。
GraspNet基准“ GraspNet-1Billion:通用对象捕获的大规模基准”的基准模型(CVPR 2020)。 [ ] [] [ ] [ ] 我们的基准模型检测到的前50个抓地力。要求Python 3 PyTorch 1.6 Open3d的0.8 TensorBoard 2.3 NumPy 科学...
We spend more than 4.8 hours a day on the Internet using a computer, and 2.1 hours using a mobile. Data, this new ethereal manna from heaven, is produced in real time. It comes in a continuous stream...
【标题】"Flipkart-Big-Billion-Day-parser"是一个针对印度知名电商平台Flipkart在其年度大促活动"Big Billion Day"期间推出的脚本。这个脚本的主要功能是监控和收集关于各种销售和优惠信息,帮助用户及时获取促销...
IBM presents the POWER9 processor, a 24-core chip implemented using 14nm SOI FinFET technology. This processor features 48 lanes of PCIe Gen 4 and 48 lanes of 25Gb/s links, making it highly optimized ...
查和弦的软件,和使用,作为一个只想下载点东西的人来说,这个几波网站我真的服了,我只能说我去年买了个包,超耐磨的。
### Billion-scale相似性搜索与GPU应用 #### 引言 随着大数据时代的到来,图像和视频等非结构化数据成为信息检索的重要来源。然而,对于这些数据的处理与搜索,传统数据库系统往往力不从心。高维特征表示的图像或...
1 Billion Word Language Model Benchmark R13 Output 是一套新的基准语料库,被用于衡量和统计语言建模进展,凭借近 10 亿字的培训数据,该基准测试可以快速评估新的语言建模技术,并将其与其他新技术相结合。...
The J3 series uses the same NOR-based ETOX™ technology as Intel's one-bit-per-cell products, capitalizing on over one billion units of flash manufacturing experience since 1987. As a result, J3 ...