When you’re storing every transaction for 800 million users and handling more than 60 million queries per second, your database environment had better be something special. Many readers might see these numbers and think NoSQL, but Facebook held a Tech Talk on Monday night explaining how it built a MySQL environment capable of handling everything the company needs in terms of scale, performance and availability.
Over the summer, I reported on Michael Stonebraker’s stance that Facebook is trapped in a MySQL “fate worse than death” because of its reliance on an outdated database paired with a complex sharding and caching strategy (read the comments and this follow-up post for a bevy of opinions on the validity of Stonebraker’s stance on SQL). Facebook declined an official comment at the time, but last night’s night talk proved to me that Stonebraker (and I) might have been wrong.
Keeping up with performance
Kicking off the event, Facebook’s Domas Mituzas shared some stats that illustrate the importance of its MySQL user database:
- MySQL handles pretty much every user interaction: likes, shares, status updates, alerts, requests, etc.
- Facebook has 800 million users; 500 million of them visit the site daily.
- 350 million mobile users are constantly pushing and pulling status updates
- 7 million applications and web sites are integrated into the Facebook platform
- User data sets are made even larger by taking into account both scope and time
And, as Mituzas pointed out, everything on Facebook is social, so every action has a ripple effect that spreads beyond that specific user. “It’s not just about me accessing some object,” he said. “It’s also about analyzing and ranking through that include all my friends’ activities.” The result (although Mituzas noted these numbers are somewhat outdated) is 60 million queries per second, and nearly 4 million row changes per second.
Facebook shards, or splits its database into numerous distinct sections, because of the sheer volume of the data it stores (a number it doesn’t share), but it caches extensively in order to write all these transactions in a hurry. In fact, most queries (more than 90 percent) never hit the database at all but only touch the cache layer. Facebook relies heavily on the open-source memcached MySQL caching tool, as well as it custom-built Flashcache module for caching data on solid-state drives.
Keeping up with scale
Speaking of drives, and hardware generally, Facebook’s Mark Konetchy took the stage after Mituzas to share some data points on the growth of Facebook’s MySQL infrastructure. Although he made sure to point out that the “buzzkills at legal” won’t let him share actual numbers, he was able to point to 3x server growth across all data centers over the past two years, 7x growth in raw user data, and 20x growth in all user data (which includes replicated data). The median data-set size per physical host has increased almost 5x since Jan. 2010, and maximum data-set size per host has increased 10x.
Konetchy credits the ability to store so much more data per host on software-performance improvements made by Facebook’s MySQL team, as well as on better server technology. Facebook’s MySQL user database is composed of approximately 60 percent hard disk drives, 20 percent SSDs and 10 percent hybrid HDD-plus-SSD servers running Flashcache.
However, Facebook wants to buy fewer servers while still improving MySQL performance. Looking forward, Konetchy said some primary objectives are to automate the splitting of large data sets onto underutilized hardware, to improve MySQL compression and to move more data to the Hadoop-based HBase data store when appropriate. NoSQL databases such as HBase (which powers Facebook Messages) weren’t really around when Facebook built its MySQL environment, so there likely are unstructured or semistructured data currently in MySQL that are better suited for HBase.
With all this growth, why MySQL?
The logical question when one sees rampant growth and performance requirements like this is “Why stick with MySQL?”. As Stonebraker pointed out over the summer, both NoSQL and NewSQL are arguably better suited to large-scale web applications than is MySQL. Perhaps, but Facebook begs to differ.
Facebook’s Mark Callaghan, who spent eight years as a “principal member of the technical staff” at Oracle , explained that using open-source software lets Facebook operate with “orders of magnitude” more machines than people, which means lots of money saved on software licenses and lots of time put into working on new features (many of which, including the rather-cool Online Schema Change, are discussed in the talk).
Additionally, he said, the patch and update cycles at companies like Oracle are far slower than what Facebook can get by working on issues internally and with an open-source community. The same holds true for general support issues, which Facebook can resolve itself in hours instead of waiting days for commercial support.
On the performance front, Callaghan noted, Facebook might find some interesting things if large vendors allowed it to benchmark their products. But they won’t, and they won’t let Facebook publish the results, so MySQL it is. Plus, he said, you actually can tune MySQL to perform very fast per node if you know what you’re doing — and Facebook has the best MySQL team around. That also helps keep costs down because it requires fewer servers.
Callaghan was more open to using NoSQL databases, but said they’re still not quite ready for primetime, especially for mission-critical workloads such as Facebook’s user database. The implementations just aren’t as mature, he said, and there are no published cases of NoSQL databases operating at the scale of Facebook’s MySQL database. And, Callaghan noted, the HBase engineering team at Facebook is quite a bit larger than the MySQL engineering team, suggesting that tuning HBase to meet Facebook’s needs is more resource-intensive process than is tuning MySQL at this point.
The whole debate about Facebook and MySQL was never really about whether it should be using it, but rather about how much work it has put into MySQL to make it work at Facebook scale. The answer, clearly, is a lot, but Facebook seems to have it down to an art at this point, and everyone appears pretty content with what they have in place and how they plan to improve it. It doesn’t seem like a fate worse than death, and if it had to start from scratch, I don’t get the impression Facebook would do too much differently, even with the new database offerings available today.
相关推荐
Visual Cryptography Scheme with Meaningful Shares Based on QR Codes
This book shares all the lessons for success Sam has learned…plus powerful insights from 17 of the industry’s biggest stars. Want to make it big in software? Start right here! Discover how to • ...
标题“21shares-scraper”指的是一个用于抓取和分析21支股票历史持有数据的程序或工具。这个工具可能是由编程爱好者或者金融数据分析人员创建的,旨在自动化收集和整理股票市场的历史数据,以便进行更深入的研究和...
This book shares the secrets of the coolest iPhone apps being built today by the best iPhone developers?invaluable knowledge for anyone who wants to create the app that everyone is talking about. ...
Note: GMiner is not fully compatible with NH yet and rejected shares can occur depending on the order. Added BMiner 3rd-party miner with variable devfee Supports DaggerHashimoto on NVIDIA with 0.65% ...
shares 是https://blog.csdn.net/gws09876/article/details/118714042文章的源码程序,包含了界面和开放的股票数据解析,实现了文章展示的效果,图表使用的是qchart
报告“世界城市客运交通出行模式(Passenger Transport Mode Shares in World Cities)”是2011年由LTA学院发布的,自那时起,该报告已成为全球交通专业人士广泛引用的参考资料。这份2014年11月的更新版涵盖了表1中...
江苏电信2005笔试题目[shares][margy] 值得参考
This book shares the secrets of the coolest iPhone apps being built today by the best iPhone developers—invaluable knowledge for anyone who wants to create the app that everyone is talking about. ...
A Trader's Journey From Data Mining to Monte Carlo Simulation to Live Training, award-winning trader Kevin Davey shares his secrets for developing trading systems that generate triple-digit returns....
In The Hardware Hacker, Huang shares his experiences in manufacturing and open hardware, creating an illuminating and compelling career retrospective. Huang's journey starts with his first visit to ...
10. 集成与扩展:Python与SQL数据库(如SQLite、MySQL)的接口方便了数据存储和检索,同时Python还能与其他编程语言(如R)集成,以利用其特定领域的强大功能。 总的来说,Python以其易用性、灵活性和强大的科学...
Key project contributor Elizabeth Joseph, with expert implementer Matt Fischer, shares up-to-date recipes for deploying OpenStack on both virtual and physical servers, and for using OpenStack to ...
Shares.js 获取URL的共享/赞/赞数。 例子 首先安装... npm install shares 然后得到要求... var shares = require ( 'shares' ) ; 然后得到一些计数! shares . get . reddit ( '...
"shares:股票配资系统"是一个专门针对股票市场中配资业务的管理系统。在金融领域,股票配资是指投资者通过向资金提供者借贷,增加自己的投资本金,从而扩大投资收益的一种方式。这种系统通常包括了用户管理、资金...
5.《免疫:接种》(On Immunity: An Inoculation)作者:乌拉·比斯 知识点:乌拉·比斯的这本书讨论了疫苗接种的医学、文化和道德问题,以及公众对疫苗接种的看法。 6.《如何用数据说谎》(How to Lie with ...
multi-broker Kafka cluster and shares the Kafka broker properties list. Chapter 3, Kafka Design, discusses the design concepts used to build the solid foundation for Kafka. It also talks about how ...
Sanbase Corporation Limited, also known as 莊皇集团公 司, released its 2020 Annual Report with stock code 8501 on the Hong Kong Stock Exchange's Growth Enterprise Market (GEM). The GEM is a platform ...