- 浏览: 337615 次
- 性别:
- 来自: 北京
最新评论
-
hoey168:
请问楼主,ICE 客户端连接多个服务端,tcp -h 172. ...
ZeroC ICE之旅------负载均衡及容错 -
iOracleSun:
makeC++SharedLib 增加 -G参数即可链接成功 ...
AIX apache module问题 -
fanyonglu:
不错,讲的很细,学习中
ZeroC ICE之旅------java -
click_guobin:
...
我在深圳,每月收入850元,怎么也花不完,晒一晒我是怎么开销和投资的(zz) -
hanyu332:
引用修改%apache%/conf/httpd.conf修改为 ...
awstats日志分析小结(1)
Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call.
In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDB Will be the biggest new kid on the block in scaling . MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post.
Impressive Stats
Scaling Strategies
IO. Bottlenecks aren't in the language when you are handling so many simultaneous requests. Making PHP 300% faster won't matter. Don't optimize PHP by using single quotes instead of double quotes when
the database is pegged.
- This approach pushes chunks of processing to another service and let's that service schedule the processing on a grid of processors.
- It's faster and more responsive than cron and only slightly less responsive than real-time.
- For example, issuing 5 synchronous database requests slows you down. Do them in parallel.
- Digg uses Gearman. An example use is to get a permalink. Three operations are done parallel: get the current logged, get the permalink, and grab the comments. All three are then combined to return a combined single answer to the client. It's also used for site crawling and logging. It's a different way of thinking.
- See Flickr - Do the Essential Work Up-front and Queue the Rest and The Canonical Cloud Architecture for more information.
- denormalize
- avoid joins
- avoid large scans across databases by partitioning
- cache
- add read slaves
- don't use NFS
Miscellaneous
decisions. Trust that people have it handled and they'll take care of it. Cuts down on meetings because you know people will do the job right.
/1.0/service/id/xml. Version both internal and external services.
MemcacheDB: Evolutionary Step for Code, Revolutionary Step for Performance
Imagine Kevin Rose, the founder of Digg, who at the time of this presentation had 40,000 followers. If Kevin diggs just once a day that's 40,000 writes. As the most active diggers are the most followed it becomes a huge performance bottleneck. Two problems appear.
You can't update 40,000 follower accounts at once. Fortunately the queuing system we talked about earlier takes care of that.
The second problem is the huge number of writes that happen. Digg has a write problem. If the average user has 100 followers that’s 300 million diggs day. That's 3,000 writes per second, 7GB of storage per day, and 5TB of data spread across 50 to 60 servers.
With such a heavy write load MySQL wasn’t going to work for Digg. That’s where MemcacheDB comes in. In Initial tests on a laptop MemcacheDB was able to handle 15,000 writes a second. MemcacheDB's own benchmark shows it capable of 23,000 writes/second and 64,000 reads/second. At those write rates it's easy to see why Joe was so excited about MemcacheDB's ability to handle their digg deluge.
What is MemcacheDB ? It's a distributed key-value storage system designed for persistent. It is NOT a cache solution, but a persistent storage engine for fast and reliable key-value based object storage and retrieval. It conforms to memcache protocol(not completed, see below), so any memcached client can have connectivity with it. MemcacheDB uses Berkeley DB as a storing backend, so lots of features including transaction and replication are supported .
Before you get too excited keep in mind this is a key-value store. You read and write records by a single key. There aren't multiple indexes and there's no SQL. That's why it can be so fast.
Digg uses MemcacheDB to scale out the huge number of writes that happen when data is denormalized. Remember it's a key-value store. The value is usually a complete application level object merged together from a possibly large number of normalized tables. Denormalizing introduces redundancies because you are keeping copies of data in multiple records instead of just one copy in a nicely normalized table. So denormalization means a lot more writes as data must be copied to all the records that contain a copy. To keep up they needed a database capable of handling their write load. MemcacheDB has the performance, especially when you layer memcached's normal partitioning scheme on top.
I asked Joe why he didn't turn to one of the in-memory data grid solutions? Some of the reasons were:
So it's an evolutionary step for code and a revolutionary step for performance. Digg is looking at using MemcacheDB across the board.
发表评论
-
Redis 2.2.0 RC1 is out
2010-12-17 10:15 1224Redis 2.2.0 RC1 新特性:很多都是我所期待的; ... -
iBATIS 3 for Java Released (BETA 1)
2009-08-09 13:52 1388A month ago iBATIS turned 7 yea ... -
Memcached 1.4.0 Release
2009-07-10 17:10 1907New Features Binary Protocol ... -
nginx-0.7.60
2009-06-16 09:01 1474Changes with nginx 0.7.60 ... -
nginx-0.7.55
2009-05-06 18:47 1140Changes with nginx 0.7.55 ... -
Open Source SSL Acceleration
2009-04-17 11:15 1737SSL acceleration is a techniq ... -
March 2009 Web Server Survey
2009-04-02 12:49 1027In the March 2009 survey, we re ... -
nginx 缓存功能
2009-03-26 16:02 4419随着 nginx-0.7.44的发布,nginx的c ... -
Memcached Beta 1.3.2 Released
2009-03-12 16:21 1207We've just released memcached ... -
nginx 0.7.40
2009-03-09 17:09 1038Changes with nginx 0.7.40 ... -
February 2009 Web Server Survey
2009-03-02 09:19 1072In the February 2009 survey we ... -
Handle 1 Billion Events Per Day Using a Memory Gri
2009-02-17 10:41 1047Moshe Kaplan of RockeTier shows ... -
MySpace Architecture
2009-02-13 10:39 1246Information Sources Presenta ... -
Cloud Relationship Model
2009-01-20 09:53 1146Hiya All, welcome to my first g ... -
January 2009 Web Server Survey
2009-01-19 15:33 1097In the January 2009 survey we ... -
December 2008 Web Server Survey
2008-12-25 17:47 1004In the December 2008 survey, ... -
Apache 2.2.11
2008-12-15 13:24 1418Changes with Apache 2.2.11 * ... -
nginx 0.7.26
2008-12-09 12:05 1073Changes with nginx 0.7.26 ... -
Python 3.0 final released
2008-12-04 10:47 1372We are pleased to announce the ... -
nginx-0.7.23
2008-11-28 08:38 908Changes with nginx 0.7.23 ...
相关推荐
This special issue of Data Mining and Knowledge Discovery addresses the issue of scaling data mining algorithms, applications and systems to massive data sets by applying high performance computing ...
9 of the top 10 most trafficked web properties on the planet including Facebook, Google, YouTube and Yahoo power their sites using MySQL. This provides unique insight into the challenges of scaling ...
Take container cluster management to the next level; learn how to administer and ... Some pre-requisite knowledge about using Amazon Web Services (AWS) EC2, CloudFormation, and VPC is also required.
Scaling PHP Steve Corona This book is for sale at http://leanpub.com/scalingphp This version was published on 2014-05-23 This is a Leanpub book. Leanpub empowers authors and publishers with the Lean
### Modern Multidimensional Scaling: Theory and Applications (2nd Edition) #### 概述 《现代多维尺度分析:理论与应用》(2nd Edition)是一本深入探讨多维尺度分析(MDS)领域的专著,由Ingwer Borg和...
High Performance Spark Best Practices for Scaling and Optimizing Apache Spark 英文azw3 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除
High Performance Spark Best Practices for Scaling and Optimizing Apache Spark 英文epub 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除
Scaling Big Data with Hadoop and Solr is a step-by-step guide that helps you build high performance enterprise search engines while scaling data. Starting with the basics of Apache Hadoop and Solr, ...
Docker on Amazon Web Services is for you if you want to build, deploy, and operate applications using the power of containers, Docker, and Amazon Web Services. Basic understanding of containers and ...
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark by Holden Karau English | 25 May 2017 | ASIN: B0725YT69J | 358 Pages | AZW3 | 3.09 MB Apache Spark is amazing when ...
Data Algorithms Recipes for Scaling Up with Hadoop and Spark 英文epub 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除
藏经阁-Scaling up date science applications.pdf
The book "Flask Web Development" by Miguel Grinberg is a comprehensive guide aimed at teaching developers how to create robust web applications using Flask, a lightweight and flexible Python web ...
In the last chapter, we’ll look at techniques for sharing data and allowing other applications to integrate with our own via data feeds and read/write APIs. While we’ll be looking at the design of ...
Starting and Scaling DevOps in the Enterprise,Starting and Scaling DevOps in the Enterprise
It provides you with solutions to many of the problems that arise in developing, maintaining, and scaling web applications. 目录: Chapter 1: Introduction Chapter 2: Using Eclipse and the ...