`

Digg Architecture

阅读更多

Digg Architecture

Update 2: : How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information.
Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades .

Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of requests a month?

Site: http://digg.com

Information Sources

  • How Digg Works by Digg
  • How Digg.com uses the LAMP stack to scale upward
  • Digg PHP 's Scalability and Performance

    Platform

  • MySQL
  • Linux
  • PHP
  • Lucene
  • Python
  • APC PHP Accelerator
  • MCache
  • Gearman - job scheduling system
  • MogileFS - open source distributed filesystem
  • Apache
  • Memcached

    The Stats

  • Started in late 2004 with a single Linux server running Apache 1.3, PHP 4, and MySQL. 4.0 using the default MyISAM storage engine
  • Over 22 million users.
  • 230 million plus page views per month
  • 26 million unique visitors per month
  • Several billion page views per month
  • None of the scaling challenges faced had anything to do with PHP. The biggest issues faced were database related.
  • Dozens of web servers.
  • Dozens of DB servers.
  • Six specialized graph database servers to run the Recommendation Engine.
  • Six to ten machines that serve files from MogileFS.

    What's Inside

  • Specialized load balancer appliances monitor the application servers, handle failover, constantly adjust the cluster according to health, balance incoming requests and caching JavaScript, CSS and images. If you don't have the fancy load balancers take a look at Linux Virtual Server and Squid as a replacement.
  • Requests are passed to the Application Server cluster. Application servers consist of: Apache+PHP, Memcached, Gearman and other daemons. They are responsible for making coordinating access to different services (DB, MogileFS, etc) and creating the response sent to the browser.
  • Uses a MySQL master-slave setup.
    - Four master databases are partitioned by functionality: promotion, profiles, comments, main. Many slave databases hang off each master.
    - Writes go to the masters and reads go to the slaves.
    - Transaction-heavy servers use the InnoDB storage engine.
    - OLAP-heavy servers use the MyISAM storage engine.
    - They did not notice a performance degradation moving from MySQL 4.1 to version 5.
    - The schema is denormalized more than "your average database design."
    - Sharding is used to break the database into several smaller ones.
  • Digg's usage pattern makes it easier for them to scale. Most people just view the front page and leave. Thus 98% of Digg's database accesses are reads. With this balance of operations they don't have to worry about the complex work of architecting for writes, which makes it a lot easier for them to scale.
  • They had problems with their storage system telling them writes were on disk when they really weren't. Controllers do this to improve the appearance of their performance. But what it does is leave a giant data integrity whole in failure scenarios. This is really a pretty common problem and can be hard to fix, depending on your hardware setup.
  • To lighten their database load they used the APC PHP accelerator MCache.
  • Memcached is used for caching and memcached servers seemed to be spread across their database and application servers. A specialized daemon monitors connections and kills connections that have been open too long.
  • You can configure PHP not parse and compile on each load using a combination of Apache 2’s worker threads, FastCGI, and a PHP accelerator. On a page's first load the PHP code is compiles so any subsequent page loads are very fast.
  • MogileFS, a distributed file system, serves story icons, user icons, and stores copies of each story’s source. A distributed file system spreads and replicates files across a lot of disks which supports fast and scalable file access.
  • A specialized Recommendation Engine service was built to act as their distributed graph database. Relational databases are not well structured for generating recommendations so a separate service was created. LinkedIn did something similar for their graph.

    Lessons Learned

  • The number of machines isn't as important what the pieces are and how they fit together.
  • Don't treat the database as a hammer. Recommendations didn't fit will with the relational model so they made a specialized service.
  • Tune MySQL through your database engine selection. Use InnoDB when you need transactions and MyISAM when you don't. For example, transactional tables on the master can use MyISAM for read-only slaves.
  • At some point in their growth curve they were unable to grow by adding RAM so had to grow through architecture.
  • People often complain Digg is slow. This is perhaps due to their large javascript libraries rather than their backend architecture.
  • One way they scale is by being careful of which application they deploy on their system. They are careful not to release applications which use too much CPU. Clearly Digg has a pretty standard LAMP architecture, but I thought this was an interesting point. Engineers often have a bunch of cool features they want to release, but those features can kill an infrastructure if that infrastructure doesn't grow along with the features. So push back until your system can handle the new features. This goes to capacity planning, something the Flickr emphasizes in their scaling process.
  • You have to wonder if by limiting new features to match their infrastructure might Digg lose ground to other faster moving social bookmarking services? Perhaps if the infrastructure was more easily scaled they could add features faster which would help them compete better? On the other hand, just adding features because you can doesn't make a lot of sense either.
  • The data layer is where most scaling and performance problems are to be found and these are language specific. You'll hit them using Java , PHP, Ruby, or insert your favorite language here.

    Related Articles

    * LinkedIn Architecture
    * Live Journal Architecture
    * Flickr Architecture
    * An Unorthodox Approach to Database Design : The Coming of the Shard

  • Ebay Architecture
  • 分享到:
    评论

    相关推荐

      ajax+jsp digg 掘客

      【标题】"Ajax+jsp Digg 掘客"是一个基于Web技术实现的互动性网站功能,主要用于提升用户体验和参与度。这个系统的核心是利用Ajax(异步JavaScript和XML)技术与JavaServer Pages(jsp)结合,模拟Digg网站的投票...

      掘客Digg新闻发掘系统源码_digg2005.zip

      【掘客Digg新闻发掘系统源码_digg2005.zip】是一个包含新闻发掘系统源代码的压缩包,主要用于帮助开发者了解或学习如何构建一个类似Digg的新闻社交平台。Digg是早期互联网上非常流行的一个新闻分享网站,用户可以...

      WordPress插件的Digg(顶客)功能

      访问别人的网站时,是否看到过Digg功能,好的文章很多人都在顶,通过这个Digg插件,虽说不是很炫酷,但它可以让你看到博客中那些内容比较受欢迎、访问量更高,进而优化网站内容,你是否想在自己的博客上添加这样的...

      digg asp 掘客网站+digg

      1,获取 wodig第四季程序 2,将程序解压 3,为了安全建议修改数据库的名称,将您的数据库命名为.asa后缀 4,根目录下的conn.asp为全站的数据库连接,admin/conn.asp为后台管理的数据库连接 ... 6,将配置好的网站上传至...

      国外仿digg程序源码精品

      国外仿digg程序源码精品,不同主题选择。

      wodig第四季中文DIGG社区v4.1.4

      WODIG是一套经过完善设计的中文DIGG社区开源程序,是Windows NT服务环境下DIGG社区程序的最佳解决方案。WODIG程序是一个集digg民主投票方式模式发掘网站、社会性标签tag归类、主题评论、主题群组、Rss订阅等多种WEB...

      Wowo Digg程序

      Wowo第三季程序,最新的Digg程序

      phpdug国外开源的DIGG程序

      **phpdug开源Digg程序详解** `phpdug`是一个基于PHP开发的开源项目,模仿了著名的社交新闻网站Digg的功能。它允许用户提交、投票和评论新闻故事,构建了一个社区驱动的内容发现平台。在深入理解这个系统之前,我们...

      digg手机字典在线生成

      digg手机字典在线生成

      wodig第四季中文DIGG社区 v4.1.5.rar

      WODIG是一套经过完善设计的中文DIGG社区开源程序,是Windows NT服务环境下DIGG社区程序的最佳解决方案。WODIG程序是一个集digg民主投票方式模式发掘网站、社会性标签tag归类、主题评论、主题群组、Rss订阅等多种WEB...

      Digg 网站 +asp 掘客网站

      一个 Digg 类网站系统,digg是科技类新闻社区,digg采取的是用户驱动(user driven)的机制,它设置了一个新闻源的缓冲,用户提交的新闻首先进入这个缓冲,如果认同这一新闻的读者足够(digg通过一种类似民主投票的方式...

      类似Digg的CMS系统 Pligg

      Pligg,最灵活的类似Digg的Web2.0 CMS系统!网页设计师可以使用Pligg做他(她)想做的任何事情。稍微懂一些PHP和Mysql的知识即可安装Pligg。如果需要修改和管理 Pligg,可能有点难度。对于精通Web开发语言的朋友,...

      [新闻文章]掘客Digg新闻发掘系统源码_digg2005.zip源码ASP.NET网站源码打包下载

      [新闻文章]掘客Digg新闻发掘系统源码_digg2005.zip源码ASP.NET网站源码打包下载[新闻文章]掘客Digg新闻发掘系统源码_digg2005.zip源码ASP.NET网站源码打包下载[新闻文章]掘客Digg新闻发掘系统源码_digg2005.zip源码...

      digg内容管理系统+掘客

      运行速度与效率:代码进行全面重整及优化,清除冗余及垃圾代码...Digg挖资讯系统:网友与会员互动交流,挖起出最好的文章 采集系统:自带强大的资讯采集系统 强大个性化的后台管理。 后台管理帐户密码 admin admin888

      wodig第四季中文DIGG社区

      《wodig第四季中文DIGG社区》是一个集成了多种功能的在线互动平台,主要服务于中文用户群体。从提供的文件名称列表来看,我们可以推测这个社区包含了一系列与网站建设和交互功能相关的脚本文件,如页面设置、内容...

      WordPress中对博客留言进行Digg的插件

      前面我曾介绍过WordPress插件的Digg(顶客)功能,这款插件可以让你看到博客中那些内容比较受欢迎、访问量更高,进而优化网站内容,而今天我们介绍的这款插件,与其具有相同之功效,WordPress Comment Digg 在读者和...

      wodig第四季中文DIGG社区v4.1.2bulid1104utf-8全开源版

      WODIG是一套经过完善设计的中文DIGG社区开源程序,是Windows NT服务环境下DIGG社区程序的最佳解决方案。WODIG程序是一个集digg民主投票方式模式发掘网站、社会性标签tag归类、主题评论、主题群组、Rss订阅等多种WEB...

      pligg中文开源(digg)

      这是一个不错的顶客开源软件 中文版本 有感兴趣的下咯 哈哈

      老牌的社交新闻网站Digg为什么要做阅读器?.docx

      【Digg为何进军阅读器市场】 Digg是一家老牌的社交新闻网站,它在Web2.0时代初期引领了潮流,但随着市场的变化和自身多次改版,Digg的用户逐渐流失,最终被Betaworks以低价收购。在Betaworks的重组下,Digg通过简洁...

      digg风格jquery注册表单rar

      【描述】:digg风格jQuery注册表单的主要特点是它提供了强大的表单验证功能,同时结合了视觉反馈提示,使得用户在填写表单时能够得到即时的错误提示,从而提高表单提交的准确性和效率。表单验证是确保用户输入的数据...

    Global site tag (gtag.js) - Google Analytics