- 浏览: 337711 次
- 性别:
- 来自: 北京
最新评论
-
hoey168:
请问楼主,ICE 客户端连接多个服务端,tcp -h 172. ...
ZeroC ICE之旅------负载均衡及容错 -
iOracleSun:
makeC++SharedLib 增加 -G参数即可链接成功 ...
AIX apache module问题 -
fanyonglu:
不错,讲的很细,学习中
ZeroC ICE之旅------java -
click_guobin:
...
我在深圳,每月收入850元,怎么也花不完,晒一晒我是怎么开销和投资的(zz) -
hanyu332:
引用修改%apache%/conf/httpd.conf修改为 ...
awstats日志分析小结(1)
也是采用Apache,Mysql,Python,Linux,Lighttpd等开源方案
可支持1亿/日的视频访问(100 million videos per day)
创立时间 2/2005
3/2006 3000w/日
7/2006 1亿/天
如下原文:
Update: YouTube: The Platform. YouTube adds a new rich set of APIs in order to become your video platform leader--all for free. Upload, edit, watch, search, and comment on video from your own site without visiting YouTube. Compose your site internally from APIs because you'll need to expose them later anyway.
YouTube grew incredibly fast, to over 100 million video views per day, with only a handful of people responsible for scaling the site. How did they manage to deliver all that video to all those users? And how have they evolved since being acquired by Google?
Information Sources
Platform
What's Inside?
The Stats
Recipe for handling rapid growth
while (true)
{
identify_and_fix_bottlenecks();
drink();
sleep();
notice_new_bottleneck();
}
This loop runs many times a day.
Web Servers
Video Serving
- More disks serving content which means more speed.
- Headroom. If a machine goes down others can take over.
- There are online backups.
- Apache had too much overhead.
- Uses epoll to wait on multiple fds.
- Switched from single process to multiple process configuration to handle more connections.
- CDNs replicate content in multiple places. There's a better chance of content being closer to the user, with fewer hops, and content will run over a more friendly network.
- CDN machines mostly serve out of memory because the content is so popular there's little thrashing of content into and out of memory.
- There's a long tail effect. A video may have a few plays, but lots of videos are being played. Random disks blocks are being accessed.
- Caching doesn't do a lot of good in this scenario, so spending money on more cache may not make sense. This is a very interesting point. If you have a long tail product caching won't always be your performance savior.
- Tune RAID controller and pay attention to other lower level issues to help.
- Tune memory on each machine so there's not too much and not too little.
Serving Video Key Points
Serving Thumbnails
- Lots of disk seeks and problems with inode caches and page caches at OS level.
- Ran into per directory file limit. Ext3 in particular. Moved to a more hierarchical structure. Recent improvements in the 2.6 kernel may improve Ext3 large directory handling up to 100 times, yet storing lots of files in a file system is still not a good idea.
- A high number of requests/sec as web pages can display 60 thumbnails on page.
- Under such high loads Apache performed badly.
- Used squid (reverse proxy) in front of Apache. This worked for a while, but as load increased performance eventually decreased. Went from 300 requests/second to 20.
- Tried using lighttpd but with a single threaded it stalled. Run into problems with multiprocesses mode because they would each keep a separate cache.
- With so many images setting up a new machine took over 24 hours.
- Rebooting machine took 6-10 hours for cache to warm up to not go to disk.
- Avoids small file problem because it clumps files together.
- Fast, fault tolerant. Assumes its working on a unreliable network.
- Lower latency because it uses a distributed multilevel cache. This cache works across different collocation sites.
- For more information on BigTable take a look at Google Architecture, GoogleTalk Architecture, and BigTable.
Databases
- Use MySQL to store meta data like users, tags, and descriptions.
- Served data off a monolithic RAID 10 Volume with 10 disks.
- Living off credit cards so they leased hardware. When they needed more hardware to handle load it took a few days to order and get delivered.
- They went through a common evolution: single server, went to a single master with multiple read slaves, then partitioned the database, and then settled on a sharding approach.
- Suffered from replica lag. The master is multi-threaded and runs on a large machine so it can handle a lot of work. Slaves are single threaded and usually run on lesser machines and replication is asynchronous, so the slaves can lag significantly behind the master.
- Updates cause cache misses which goes to disk where slow I/O causes slow replication.
- Using a replicating architecture you need to spend a lot of money for incremental bits of write performance.
- One of their solutions was prioritize traffic by splitting the data into two clusters: a video watch pool and a general cluster. The idea is that people want to watch video so that function should get the most resources. The social networking features of YouTube are less important so they can be routed to a less capable cluster.
- Went to database partitioning.
- Split into shards with users assigned to different shards.
- Spreads writes and reads.
- Much better cache locality which means less IO.
- Resulted in a 30% hardware reduction.
- Reduced replica lag to 0.
- Can now scale database almost arbitrarily.
Data Center Strategy
looks at different metrics to know who is closest.
Lessons Learned
- Software: DB, caching
- OS: disk I/O
- Hardware: memory, RAID
发表评论
-
Redis 2.2.0 RC1 is out
2010-12-17 10:15 1225Redis 2.2.0 RC1 新特性:很多都是我所期待的; ... -
iBATIS 3 for Java Released (BETA 1)
2009-08-09 13:52 1389A month ago iBATIS turned 7 yea ... -
Memcached 1.4.0 Release
2009-07-10 17:10 1908New Features Binary Protocol ... -
nginx-0.7.60
2009-06-16 09:01 1474Changes with nginx 0.7.60 ... -
nginx-0.7.55
2009-05-06 18:47 1141Changes with nginx 0.7.55 ... -
Open Source SSL Acceleration
2009-04-17 11:15 1737SSL acceleration is a techniq ... -
March 2009 Web Server Survey
2009-04-02 12:49 1028In the March 2009 survey, we re ... -
nginx 缓存功能
2009-03-26 16:02 4420随着 nginx-0.7.44的发布,nginx的c ... -
Memcached Beta 1.3.2 Released
2009-03-12 16:21 1207We've just released memcached ... -
nginx 0.7.40
2009-03-09 17:09 1039Changes with nginx 0.7.40 ... -
February 2009 Web Server Survey
2009-03-02 09:19 1073In the February 2009 survey we ... -
Handle 1 Billion Events Per Day Using a Memory Gri
2009-02-17 10:41 1048Moshe Kaplan of RockeTier shows ... -
Scaling Digg and Other Web Applications
2009-02-16 11:36 1099Joe Stump, Lead Architect at D ... -
MySpace Architecture
2009-02-13 10:39 1247Information Sources Presenta ... -
Cloud Relationship Model
2009-01-20 09:53 1147Hiya All, welcome to my first g ... -
January 2009 Web Server Survey
2009-01-19 15:33 1098In the January 2009 survey we ... -
December 2008 Web Server Survey
2008-12-25 17:47 1005In the December 2008 survey, ... -
Apache 2.2.11
2008-12-15 13:24 1419Changes with Apache 2.2.11 * ... -
nginx 0.7.26
2008-12-09 12:05 1074Changes with nginx 0.7.26 ... -
Python 3.0 final released
2008-12-04 10:47 1372We are pleased to announce the ...
相关推荐
YouTube Architecture.pdf
YouTube Architecture 253 Information Sources 254 Platform 254 What's Inside? 254 The Stats 254 Recipe for handling rapid growth 255 Web Servers 255 Video Serving 256 Serving Video Key ...
YouTube技术架构详解,包括被google收购前及收购后的架构。
APVideoPlayer Simple and clean architecture of android YouTube video player with google sign-in and full screen.
This is a sample app demonstrating Youtube UX/UI animation using ConstraintLayout.It implements the Keyframe Animation feature in ConstrainLayout.This sample app is built on Android Architecture ...
Quiz-App是一个由Caio和Mike在YouTube系列节目中逐步构建的iOS应用程序,它不仅是一个实用的问答应用,还是一个展示如何运用Swift语言、测试驱动开发(TDD)以及Clean Architecture原则的优秀示例。下面我们将深入...
CUDA速成课程是针对GPU编程技术的一套学习资源,由CoffeeBeforeArch在YouTube上推出的系列教程。CUDA(Compute Unified Device Architecture)是由NVIDIA开发的一种并行计算平台和编程模型,它允许开发者利用图形...
这些文档分别来自于eBay、YouTube、Facebook、淘宝、豆瓣、亚马逊和谷歌,涵盖了各大公司在不同时间点的架构设计和实践经验。这些公司都是互联网行业的巨头,它们的架构设计对整个行业有着深远的影响。以下是对这些...
RealtimeDatabase-使用MVVM的干净架构。 这是一个以Kotlin为基础的应用程序构建,以示例为例,展示了如何使用三种不同的方法显示数据。 第一种方法使用回调,第二种方法使用称为... 在youtube上也可以看到:即将推出。
Youtube8m, and more. You will not only learn about the different mobile and embedded platforms supported by TensorFlow but also how to set up cloud platforms for deep learning applications. Get a ...
You will learn the performance of different DNNs on some popularly used data sets such as MNIST, CIFAR-10, Youtube8m, and more. You will not only learn about the different mobile and embedded ...
神经元网络入门视频——来源于Youtube 包括: 视频1——Data and Architecture 视频2——Forward Propagation 视频3——Gradient Descent 视频4——Backpropagation 视频5——Numerical Gradient Checking ...
:eyes:我对使用Lambda Architecture设计端到端的管道和数据转换项目感兴趣,该Lambda Architecture概括了由大数据工具组成的Streaming Micro Batching和Batching系统。 :seedling:我目前正在学习 :closed_mailbox...
搜索远程作业 搜索远程作业应用程序是一个Android应用程序,您可以在其中可以在任何地方的所有字段中搜索远程作业, 您可以轻松地搜索您的字段, 您将找到所有最新的远程作业,也可以申请直接从应用程序中找到工作。...
"STAIR Hardware and Software Architecture.pdf"可能讨论的是一个具体项目(如STAIR)的硬件和软件架构,其中可能包括Bumblebee2在系统中的集成和数据处理流程。 这些资料的结合,不仅提供了Bumblebee2的基础知识...
他们参考了Peng Wu的演讲"Unlocking JavaScript V8 on RISC-V"(可在YouTube观看)以及PLCT实验室的报告"V8 for RISCV, where to come and where to go"(GitHub链接)。这个过程涉及到大量工作,包括理解V8引擎的...
在Java编程语言中,"CiphersInJava"项目是一个用于理解和实现加密算法的开源资源,主要面向YouTube(YT)频道的编程教学。这个项目提供的源代码涵盖了多种密码学中的加密和解密方法,有助于开发者深入学习和应用这些...