`
DivineDm
  • 浏览: 3516 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类

Numbers U Should Know

阅读更多
Writes are expensive!

Datastore is transactional: writes require disk access
Disk access means disk seeks
Rule of thumb: 10ms for a disk seek
Simple math: 1s / 10ms = 100 seeks/sec maximum
Depends on:
* The size and shape of your data
* Doing work in batches (batch puts and gets)

Reads are cheap!
Reads do not need to be transactional, just consistent
Data is read from disk once, then it's easily cached
All subsequent reads come straight from memory
Rule of thumb: 250usec for 1MB of data from memory
Simple math: 1s / 250usec = 4GB/sec maximum
* For a 1MB entity, that's 4000 fetches/sec

Numbers Miscellaneous
This group of numbers is from a presentation Jeff Dean gave at a Engineering All-Hands Meeting at Google.

L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 10,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from network 10,000,000 ns
Read 1 MB sequentially from disk 30,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns

The Lessons
Writes are 40 times more expensive than reads.
Global shared data is expensive. This is a fundamental limitation of distributed systems. The lock contention in shared heavily written objects kills performance as transactions become serialized and slow.
Architect for scaling writes.
Optimize for low write contention.
Optimize wide. Make writes as parallel as you can.

The Techniques
Keep in mind these are from a Google AppEngine perspective, but the ideas are generally applicable.

Sharded Counters
We always seem to want to keep count of things. But BigTable doesn't keep a count of entities because it's a key-value store. It's very good at getting data by keys, it's not interested in how many you have. So the job of keeping counts is shifted to you.

The naive counter implementation is to lock-read-increment-write. This is fine if there a low number of writes. But if there are frequent updates there's high contention. Given the the number of writes that can be made per second is so limited, a high write load serializes and slows down the whole process.

The solution is to shard counters. This means:
Create N counters in parallel.
Pick a shard to increment transactionally at random for each item counted.
To get the real current count sum up all the sharded counters.
Contention is reduced by 1/N. Writes have been optimized because they have been spread over the different shards. A bottleneck around shared state has been removed.

This approach seems counter-intuitive because we are used to a counter being a single incrementable variable. Reads are cheap so we replace having a single easily read counter with having to make multiple reads to recover the actual count. Frequently updated shared variables are expensive so we shard and parallelize those writes.

With a centralized database letting the database be the source of sequence numbers is doable. But to scale writes you need to partition and once you partition it becomes difficult to keep any shared state like counters. You might argue that so common a feature should be provided by GAE and I would agree 100 percent, but it's the ideas that count (pun intended).

Paging Through Comments

How can comments be stored such that they can be paged through
in roughly the order they were entered?

Under a high write load situation this is a surprisingly hard question to answer. Obviously what you want is just a counter. As a comment is made you get a sequence number and that's the order comments are displayed. But as we saw in the last section shared state like a single counter won't scale in high write environments.

A sharded counter won't work in this situation either because summing the shared counters isn't transactional. There's no way to guarantee each comment will get back the sequence number it allocated so we could have duplicates.

Searches in BigTable return data in alphabetical order. So what is needed for a key is something unique and alphabetical so when searching through comments you can go forward and backward using only keys.

A lot of paging algorithms use counts. Give me records 1-20, 21-30, etc. SQL makes this easy, but it doesn't work for BigTable. BigTable knows how to get things by keys so you must make keys that return data in the proper order.

In the grand old tradition of making unique keys we just keep appending stuff until it becomes unique. The suggested key for GAE is: time stamp + user ID + user comment ID.

Ordering by date is obvious. The good thing is getting a time stamp is a local decision, it doesn't rely on writes and is scalable. The problem is timestamps are not unique, especially with a lot of users.

So we can add the user name to the key to distinguish it from all other comments made at the same time. We already have the user name so this too is a cheap call.

Theoretically even time stamps for a single user aren't sufficient. What we need then is a sequence number for each user's comments.

And this is where the GAE solution turns into something totally unexpected. Our goal is to remove write contention so we want to parallelize writes. And we have a lot available storage so we don't have to worry about that.

With these forces in mind, the idea is to create a counter per user. When a user adds a comment it's added to a user's comment list and a sequence number is allocated. Comments are added in a transactional context on a per user basis using Entity Groups. So each comment add is guaranteed to be unique because updates in an Entity Group are serialized.

The resulting key is guaranteed unique and sorts properly in alphabetical order. When paging a query is made across entity groups using the ID index. The results will be in the correct order. Paging is a matter of getting the previous and next keys in the query for the current page. These keys can then be used to move through index.

I certainly would have never thought of this approach. The idea of keeping per user comment indexes is out there. But it cleverly follows the rules of scaling in a distributed system. Writes and reads are done in parallel and that's the goal. Write contention is removed
分享到:
评论

相关推荐

    Tricks every ClickHouse designer should know.pdf

    加载大量数据时,可以利用`INSERT INTO`语句结合`FROM system.numbers`生成测试数据,这里通过`LIMIT 100000000`来加载1亿条记录,同时设置`max_block_size=1000000`以优化批量写入性能。加载完成后,可以查询`...

    Advanced Topics in Java

    advanced programming concepts that any aspiring programmer should know. The major topics covered include elementary sorting methods (selection, insertion), searching (sequential, binary), merging, ...

    华南理工大学计算机全英班算法设计实验

    In order to solve a problem by using dynamic programming, we should know how to: Find out the recurrence relations. Represent the problem by a multistage graph. 2. Experimental Purpose (1)...

    Questions and answers

    We don't know why, but all the data is coded by the natural numbers from 1 up to 5000. The size of the main base (we'll denote it be N) is rather big — it may contain up to 100 000 those numbers. ...

    [C语言] C语言 进阶开发 (数据结构核心概念) (英文版)

    Advanced Topics In C teaches concepts that any budding programmer should know. You'll delve into topics such as sorting, searching, merging, recursion, random numbers and simulation, among others. You...

    Advanced Topics in C.pdf

    Advanced Topics In C teaches concepts that any budding programmer should know. You'll delve into topics such as sorting, searching, merging, recursion, random numbers and simulation, among others. You...

    Compiler Building Tutorial

    who enjoy computing and have always wanted to know how compilers work. A lot of compiler the- ory has been left out, but the practical issues are covered. By the time you have completed the series, ...

    Google C++ Style Guide(Google C++编程规范)高清PDF

    Note: If you use a symbol Foo in your source file, you should bring in a definition for Foo yourself, either via an #include or via a forward declaration. Do not depend on the symbol being brought in ...

    poj 1005 I Think I Need a Houseboat

    Since Fred is hoping to live in this house the rest of his life, he needs to know if his land is going to be lost to erosion. After doing more research, Fred has learned that the land that is being...

    Visualize This - Nathan Yau

    Find out all you can about your data, because the more you know what's behind the numbers, the better story you can tell. 3) Visualize Your Data (Click Graphic to See Larger Version) Once you know ...

    google repo工具

    The <BRANCH_NAME> argument should provide a short description of the change you are trying to make to the projects.If you don't know, consider using the name default. The <PROJECT_LIST> specifies ...

    XML Processing with Perl, Python, and PHP (2002).pdf

    extracted compound addresses and telephone numbers, and tidied up the results to put into a new version of the database. Access to the database was through a Sun-based Unix system, and the PCs and ...

    apktool documentation

    These values should range from 1 to 9. Any APK that installs itself as 127 is 0x7F which is an internal pkgId. Internal Frameworks Apktool comes with an internal framework like mentioned above. This ...

    JSP Simple Examples

    In this program we are going to know how the server determines whether the password entered by the user is correct or not. This whole process is controlled on the server side. Multiple forms in jsp ...

    英语四级真题大家看看哦

     Do“lucky numbers ”really bring good luck?Different people have different views on it.  注:一个段落有时很适宜以问句开始,考生应掌握这一写作方法。  11.表示结论  1)In short,it can be said ...

    3D Math Primer for graphics and game development

    (because, after all, you already know all this stuff). Our book is definitely the book you should read first, before buying that “Write a 3DVideoGame in 21Days” book. This book is not only an ...

    Principles of Protocol Design

    VDM or Z, is not essential, but to get the most out of the book you should know about the sort of discrete mathematics which is used in computer science, and be aware of the basic concepts of ...

    数位板压力测试

    Devices may have multiple cursor types that have different physical configurations, or that have differ¬ent numbers of buttons, or return auxiliary information, such as pressure information....

Global site tag (gtag.js) - Google Analytics