`
fujohnwang
  • 浏览: 156259 次
社区版块
存档分类
最新评论

[转贴]Scalability Best Practices: Lessons from eBay

阅读更多
原文出处http://www.infoq.com/articles/ebay-scalability-best-practices

At eBay, one of the primary architectural forces we contend with every day is scalability. It colors and drives every architectural and design decision we make. With hundreds of millions of users worldwide, over two billion page views a day, and petabytes of data in our systems, this is not a choice - it is a necessity.

In a scalable architecture, resource usage should increase linearly (or better) with load, where load may be measured in user traffic, data volume, etc. Where performance is about the resource usage associated with a single unit of work, scalability is about how resource usage changes as units of work grow in number or size. Said another way, scalability is the shape of the price-performance curve, as opposed to its value at one point in that curve.

There are many facets to scalability - transactional, operational, development effort. In this article, I will outline several of the key best practices we have learned over time to scale the transactional throughput of a web-based system. Most of these best practices will be familiar to you. Some may not. All come from the collective experience of the people who develop and operate the eBay site.

Best Practice #1: Partition by Function

Whether you call it SOA, functional decomposition, or simply good engineering, related pieces of functionality belong together, while unrelated pieces of functionality belong apart. Further, the more decoupled that unrelated functionality can be, the more flexibility you will have to scale them independently of one another.

At the code level, we all do this all the time. JAR files, packages, bundles, etc., are all mechanisms we use to isolate and abstract one set of functionality from another.

At the application tier, eBay segments different functions into separate application pools. Selling functionality is served by one set of application servers, bidding by another, search by yet another. In total, we organize our roughly 16,000 application servers into 220 different pools. This allows us to scale each pool independently of one another, according to the demands and resource consumption of its function. It further allows us to isolate and rationalize resource dependencies - the selling pool only needs to talk to a relatively small subset of backend resources, for example.

At the database tier, we follow much the same approach. There is no single monolithic database at eBay. Instead there is a set of database hosts for user data, a set for item data, a set for purchase data, etc. - 1000 logical databases in all, on 400 physical hosts. Again, this approach allows us to scale the database infrastructure for each type of data independently of the others.

Best Practice #2: Split Horizontally

While functional partitioning gets us part of the way, by itself it is not sufficient for a fully scalable architecture. As decoupled as one function may be from another, the demands of a single functional area can and will outgrow any single system over time. Or, as we like to remind ourselves, "if you can't split it, you can't scale it." Within a particular functional area, then, we need to be able to break the workload down into manageable units, where each individual unit retains good price-performance. Here is where the horizontal split comes in.

At the application tier, where eBay's interactions are by design stateless, splitting horizontally is trivial. Use a standard load-balancer to route incoming traffic. Because all application servers are created equal and none retains any transactional state, any of them will do. If we need more processing power, we simply add more application servers.

The more challenging problem arises at the database tier, since data is stateful by definition. Here we split (or "shard") the data horizontally along its primary access path. User data, for example, is currently divided over 20 hosts, with each host containing 1/20 of the users. As our numbers of users grow, and as the data we store for each user grows, we add more hosts, and subdivide the users further. Again, we use the same approach for items, for purchases, for accounts, etc. Different use cases use different schemes for partitioning the data: some are based on a simple modulo of a key (item ids ending in 1 go to one host, those ending in 2 go to the next, etc.), some on a range of ids (0-1M, 1-2M, etc.), some on a lookup table, some on a combination of these strategies. Regardless of the details of the partitioning scheme, though, the general idea is that an infrastructure which supports partitioning and repartitioning of data will be far more scalable than one which does not.

Best Practice #3: Avoid Distributed Transactions

At this point, you may well be wondering how the practices of partitioning data functionally and horizontally jibe with transactional guarantees. After all, almost any interesting operation updates more than one type of entity - users and items come to mind immediately. The orthodox answer is well-known and well-understood - create a distributed transaction across the various resources, using two-phase commit to guarantee that all updates across all resources either occur or do not. Unfortunately, this pessimistic approach comes with substantial costs. Scaling, performance, and latency are adversely affected by the costs of coordination, which worsens geometrically as you increase the number of dependent resources and incoming clients. Availability is similarly limited by the requirement that all dependent resources are available. The pragmatic answer is to relax your transactional guarantees across unrelated systems.

It turns out that you can't have everything. In particular, guaranteeing immediate consistency across multiple systems or partitions is typically neither required nor possible. The CAP theorem, postulated almost 10 years ago by Inktomi's Eric Brewer, states that of three highly desirable properties of distributed systems - consistency (C), availability (A), and partition-tolerance (P) - you can only choose two at any one time. For a high-traffic web site, we have to choose partition-tolerance, since it is fundamental to scaling. For a 24x7 web site, we typically choose availability. So immediate consistency has to give way.

At eBay, we allow absolutely no client-side or distributed transactions of any kind - no two-phase commit. In certain well-defined situations, we will combine multiple statements on a single database into a single transactional operation. For the most part, however, individual statements are auto-committed. While this intentional relaxation of orthodox ACID properties does not guarantee immediate consistency everywhere, the reality is that most systems are available the vast majority of the time. Of course, we do employ various techniques to help the system reach eventual consistency: careful ordering of database operations, asynchronous recovery events, and reconciliation or settlement batches. We choose the technique according to the consistency demands of the particular use case.

The key takeaway here for architects and system designers is that consistency should not be viewed as an all or nothing proposition. Most real-world use cases simply do not require immediate consistency. Just as availability is not all or nothing, and we regularly trade it off against cost and other forces, similarly our job becomes tailoring the appropriate level of consistency guarantees to the requirements of a particular operation.

Best Practice #4: Decouple Functions Asynchronously

The next key element to scaling is the aggressive use of asynchrony. If component A calls component B synchronously, A and B are tightly coupled, and that coupled system has a single scalability characteristic -- to scale A, you must also scale B. Equally problematic is its effect on availability. Going back to Logic 101, if A implies B, then not-B implies not-A. In other words, if B is down then A is down. By contrast, if A and B integrate asynchronously, whether through a queue, multicast messaging, a batch process, or some other means, each can be scaled independently of the other. Moreover, A and B now have independent availability characteristics - A can continue to move forward even if B is down or distressed.

This principle can and should be applied up and down an infrastructure. Techniques like SEDA (Staged Event-Driven Architecture) can be used for asynchrony inside an individual component while retaining an easy-to-understand programming model. Between components, the principle is the same -- avoid synchronous coupling as much as possible. More often than not, the two components have no business talking directly to one another in any event. At every level, decomposing the processing into stages or phases, and connecting them up asynchronously, is critical to scaling.
Best Practice #5: Move Processing To Asynchronous Flows

Now that you have decoupled asynchronously, move as much processing as possible to the asynchronous side. In a system where replying rapidly to a request is critical, this can substantially reduce the latency experienced by the requestor. In a web site or trading system, it is worth it to trade off data or execution latency (how quickly we get everything done) for user latency (how quickly the user gets a response). Activity tracking, billing, settlement, and reporting are obvious examples of processing that belongs in the background. But often significant steps in processing of the primary use case can themselves be broken out to run asynchronously. Anything that can wait should wait.

Equally as important, but less often appreciated, is the fact that asynchrony can substantially reduce infrastructure cost. Performing operations synchronously forces you to scale your infrastructure for the peak load - it needs to handle the worst second of the worst day at that exact second. Moving expensive processing to asynchronous flows, though, allows you to scale your infrastructure for the average load instead of the peak. Instead of needing to process all requests immediately, the queue spreads the processing over time, and thereby dampens the peaks. The more spiky or variable the load on your system, the greater this advantage becomes.

Best Practice #6: Virtualize At All Levels

Virtualization and abstraction are everywhere, following the old computer science aphorism that the solution to every problem is another level of indirection. The operating system abstracts the hardware. The virtual machine in many modern languages abstracts the operating system. Object-relational mapping layers abstract the database. Load-balancers and virtual IPs abstract network endpoints. As we scale our infrastructure through partitioning by function and data, an additional level of virtualization of those partitions becomes critical.

At eBay, for example, we virtualize the database. Applications interact with a logical representation of a database, which is then mapped onto a particular physical machine and instance through configuration. Applications are similarly abstracted from the split routing logic, which assigns a particular record (say, that of user XYZ) to a particular partition. Both of these abstractions are implemented in our home-grown O/R layer. This allows the operations team to rebalance logical hosts between physical hosts, by separating them, consolidating them, or moving them -- all without touching application code.

We similarly virtualize the search engine. To retrieve search results, an aggregator component parallelizes queries over multiple partitions, and makes a highly partitioned search grid appear to clients as one logical index.

The motivation here is not only programmer convenience, but also operational flexibility. Hardware and software systems fail, and requests need to be re-routed. Components, machines, and partitions are added, moved, and removed. With judicious use of virtualization, higher levels of your infrastructure are blissfully unaware of these changes, and you are therefore free to make them. Virtualization makes scaling the infrastructure possible because it makes scaling manageable.


Best Practice #7: Cache Appropriately

The last component of scaling is the judicious use of caching. The specific recommendations here are less universal, because they tend to be highly dependent on the details of the use case. At the end of the day, the goal of an efficient caching system to maximize your cache hit ratio within your storage constraints, your requirements for availability, and your tolerance for staleness. It turns out that this balance can be surprisingly difficult to strike. Once struck, our experience has shown that it is also quite likely to change over time.

The most obvious opportunities for caching come with slow-changing, read-mostly data - metadata, configuration, and static data, for example. At eBay, we cache this type of data aggressively, and use a combination of pull and push approaches to keep the system reasonably in sync in the face of updates. Reducing repeated requests for the same data can and does make a substantial impact. More challenging is rapidly-changing, read-write data. For the most part, we intentionally sidestep these challenges at eBay. We have traditionally not done any caching of transient session data between requests. We similarly do not cache shared business objects, like item or user data, in the application layer. We are explicitly trading off the potential benefits of caching this data against availability and correctness. It should be noted that other sites do take different approaches, make different tradeoffs, and are also successful.

Not surprisingly, it is quite possible to have too much of a good thing. The more memory you allocate for caching, the less you have available to service individual requests. In an application layer which is often memory-constrained, this is a very real tradeoff. More importantly, though, once you have come to rely on your cache, and have taken the extremely tempting steps of downsizing the primary systems to handle just the cache misses, your infrastructure literally may not be able to survive without it. Once your primary systems can no longer directly handle your load, your site's availability now depends on 100% uptime of the cache - a potentially dangerous situation. Even something as routine as rebalancing, moving, or cold-starting the cache becomes problematic.

Done properly, a good caching system can bend your scaling curve below linear - subsequent requests retrieve data cheaply from cache rather than the relatively more expensive primary store. On the other hand, caching done poorly introduces substantial additional overhead and availability challenges. I have yet to see a system where there are not significant opportunities for caching. The key point, though, is to make sure your caching strategy is appropriate for your situation.

Summary

Scalability is sometimes called a "non-functional requirement," implying that it is unrelated to functionality, and strongly implying that it is less important. Nothing could be further from the truth. Rather, I would say, scalability is a prerequisite to functionality - a "priority-0" requirement, if ever there was one.

I hope that you find the descriptions of these best practices useful, and that they help you to think in a new way about your own systems, whatever their scale.



About the Author

Randy Shoup is a Distinguished Architect at eBay. Since 2004, he has been the primary architect for eBay's search infrastructure. Prior to eBay, he was the Chief Architect at Tumbleweed Communications, and has also held a variety of software development and architecture roles at Oracle and Informatica.

He presents regularly at industry conferences on scalability and architecture patterns.
分享到:
评论
3 楼 thirdson 2008-07-25  
说的对.
2 楼 fujohnwang 2008-05-31  
贱人,别人说英文看不懂我可能信,你要是看不懂才怪
1 楼 bigapple 2008-05-29  
老大,能否翻译成中文?鸟文看不懂!

相关推荐

    Scalability Rules: 50 Principles for Scaling Web Sites

    ### Scalability Rules: 50 Principles for Scaling Web Sites #### 书籍概述 《Scalability Rules: 50 Principles for Scaling Web Sites》是一本由Martin L. Abbott与Michael T. Fisher合著的专业书籍,旨在为...

    Best Practices for SAP BW on DB2 UDB for z/OS V8

    ### SAP BW on DB2 UDB for z/OS V8: Best Practices Overview #### Introduction The IBM Redbook titled "Best Practices for SAP Business Information Warehouse (BW) on DB2 UDB for z/OS V8" provides ...

    RESTful.Java.Patterns.and.Best.Practices.1783287969

    Title: RESTful Java Patterns and Best Practices Author: Bhakti Mehta Length: 131 pages Edition: 1 Language: English Publisher: Packt Publishing Publication Date: 2014-09-17 ISBN-10: 1783287969 ISBN-13...

    The Art of Scalability

    * Culture of Scalability:Scalability需要建立一个具有Scalability文化的组织,鼓励团队成员关注Scalability的设计和实现。 * Continuous Improvement:Scalability需要不断地完善和改进,确保系统的Scalability...

    go-dojo-scalability-protocols:尝试[可伸缩性协议](http

    Go-Dojo可扩展性协议用实验。先决条件go 1.4.2+建造 gb build all跑步对于所有可用的标志,请运行 ./bin/sp --help在本地 ./bin/sp -mode=receiver发件人发送一百万条消息: ./bin/sp -mode=sender -num_messages=...

    HDFS scalability:the limits to growth

    ### HDFS可扩展性:增长的极限 #### HDFS与Hadoop Hadoop Distributed File System (HDFS)作为Hadoop项目中的一个核心组件,是一种开放源代码系统,它被广泛应用于处理大规模数据集的场景中。...

    RESTful Java Patterns and Best Practices(PACKT,2015)

    easing scalability, visibility, and reliability; and being platform and language agnostic. This book is a practical, hands-on guide that provides you with clear and pragmatic information to take ...

    eBay Architecture-Scalability with Agility

    标题“eBay Architecture - Scalability with Agility”及描述同名,揭示了eBay如何通过灵活且可扩展的架构来应对日益增长的数据量、用户需求以及业务挑战。下面,我们将深入探讨这一主题,理解eBay是如何实现其架构...

    论文研究-Modeling the Internet Routing Scalability: From Qualitative Description to Quantitative Evaluation.pdf

    因特网路由可扩展性建模:从定性描述到定量评价,张威,,互联网网在过去的几十年增长迅速。随着网络规模的增长,其路由系统正面临可扩展性方面的问题。近年来,IRTF特别提出了这一问题,��

    Addison.Wesley.C.Plus.Plus.Coding.Standards.101.Rules.Guidelines.and.Best.Practices.Oct.2004.eBoo.chm

    From type definition to error handling, this book presents C++ best practices, including some that have only recently been identified and standardized-techniques you may not know even if you've used ...

    Java Performance and Scalability

    Each optimization discusses techniques to improve the performance and scalability of your code. Every claim is substantiated with hard numbers and an experience-based evaluation. Java(TM) Performance...

    Pro SQL Server 2012 Practices

    Pro SQL Server 2012 Practices is an anthology of high-end wisdom from a group of accomplished database administrators who are quietly but relentlessly pushing the performance and feature envelope of ...

    Packt.QlikView Server and Publisher

    ### QlikView Server and Publisher: Key Concepts and Best Practices #### Introduction The book "QlikView Server and Publisher" by Stephen Redmond provides comprehensive guidance on deploying and ...

    PHP Microservices

    You will also understand how to migrate a monolithic application to the microservice architecture while keeping scalability and best practices in mind. Furthermore you will get into a few important ...

    Scalability patterns

    标题与描述:“Scalability patterns”(可扩展性模式) 在IT领域,系统设计的核心目标之一就是确保其可扩展性,即系统能够处理不断增加的工作负载而不牺牲性能或稳定性。这通常涉及采用特定的设计模式和策略,以...

    Pro-Java-Clustering-and-Scalability.pdf

    enough for me professionally (although doing this while following best practices is not an easy task). I wanted to understand the entire process of creating software and delivering it to a production ...

    Production-Ready Microservices(O'Reilly,2016)

    Scalability and Performance: learn essential components for achieving greater microservice efficiency Fault Tolerance and Catastrophe Preparedness: ensure availability by actively pushing ...

    Everything I Learned About Scaling Online Games I Learned at Google and eBay

    Scalability and Performance Small Details Matter eBay Search Index Compression TOME Combat Server Measurement and Distributions How to Scale - Scaling DevOps Automate Everything Autoscaling App Engine...

Global site tag (gtag.js) - Google Analytics