`
yinwufeng
  • 浏览: 286975 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
社区版块
存档分类
最新评论

谷歌Jeff Dean的分布式系统设计模式

阅读更多
IKS workshop: semantic technology Parsing a large JSON file efficiently and easily  » In a recent keynote at SOCC, Jeff Dean of Google listed a number of design patterns for system design and a number of challenges for the future. I wrote them down, and thought I might as well share them.

You can find the presentation here (it begins with something else, just skip forward a few slides, Linux users: install moonlight).

He starts off noting a shift that happened over the last 5 to 10 years: (small) devices interact with services that are provided by large data centers. This allows clients to use large bursts of computational power, such as in the case of a single Google search which runs across 1000's of servers.

Then he goes on with the typical introductions to MapReduce (with a map tile generation example) and BigTable (mentioning what's new since the paper). He also mentions Colossus (next-gen GFS) and talks about Spanner, a cross data center storage and computing system.

And then we get to the system design experiences and design patterns. Some of these are very generic, others are specifically for distributed systems. The below is just a tight summary, the actual slides and talk go into more detail on each of these.

1.Break large complex systems down into many services, with few dependencies.
Easy to test and deploy, allows lots of experiments, reimplement without affecting clients, small teams can work independently.
A single google.com search touches over 100 services.
Personal note: this is important in any design, distributed or not. It is the purpose of the module system in our Daisy/Kauri Runtime system.

2.Protocol description language is a must.
See protocol buffers.
Servers ignore tags they don't understand, but pass the information through.
Personal note: in the XML world, this is also known as the "must ignore" pattern.

3.Ability to estimate performance of a system design, without actually having to build it: do 'back of the envelope' calculations. See slide on numbers everyone should know, shown here to the right. Know your basic building blocks (understand their implementation at a high level).
4.Designing and building infrastructure: important not to try to be all things to all people, don't build infrastructure just for its own sake: identify common needs and address them.

5.Design for growth, but don't design to scale infinitely: 5 to 50 times growth good to consider, 1000 times probably requires rethink and rewrite.

6.Single master, 1000's of servers. Master orchestrates global operation of the system, but client interaction with the master is fairly minimal. Often: hot standby of master. Simpler to reason about, but scales less (1000's of workers, not 100,000's).

7.Canary requests: odd requests sometimes crash server process. When sending same request to many servers, all the servers might crash. Therefore: first send the request to one server.

8.Tree distribution of requests, to avoid many outgoing RCP requests from one server.

9.Use backup requests to minimize latency. This avoids waiting on a few slow machines when request is sent to 1000's of machines.

10.Use multiple smaller units per machine, to minimize recovery time when a machine crashes, and to have fine-grained load balancing. See the many tablets per tablet server in BigTable.
Personal note: I found this a key insight in understanding how scalable stores or indexes work in contrast to say a more traditional partitioned RDBMS setup (see earlier blog). Besides BigTable/HBase, this idea is also applied in Elastic Search and Katta.

11.Range distribution of data, not hash. Allows users to reason about, and control, locality across keys.

12.Elastic systems. Avoid overcapacity and undercapacity. Design to shrink & grow capacity. Do something reasonable in case of overload, e.g. disable certain features (reduce size of index searched, disable spelling correction tip, ...)

13.One interface, multiple implementations. E.g. in search the combination of freshness & massive size is rather impossible, therefore partition into subproblems.

14.Add sufficient monitoring/status/debugging hooks.

He ends with some challenges for the future:

1.Adaptivity in world-wide systems. Challenge: automatic, dynamic world-wide placement of data & computation to minimize latency and/or cost.

2.Building applications on top of weakly consistent storage systems. Challenge: general model of consistency choice, explained and codified. Challenge: easy to use abstractions for resolving conflicting updates to multiple versions of a piece of state.

3.Distributed system abstractions. Cf. MapReduce: are there unifying abstractions for other kinds of distributed systems problems?



分享到:
评论

相关推荐

    Google 构建大规模分布式系统的设计、教训和建议

    本文将基于谷歌高级研究员Jeff Dean的一次演讲内容,深入探讨谷歌在构建大规模分布式系统中的设计思路、遇到的问题以及相应的解决方案。 #### 系统架构概述 在构建大规模分布式系统时,硬件基础设施的选择至关重要...

    深度学习革命及其对计算机架构和芯片设计的影响【Google Jeff Dean独自署名论文】.zip

    计算机界神级人物、谷歌人工智能主管Jeff Dean发表了独自署名论文《The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design》,17页pdf论文,长文介绍了后摩尔定律时代的...

    jeff dean大神本科学位论文

    Jeff Dean本科学位论文,阐述了分布式ANN训练方案。对后续Tensor flow有参考。

    Jeff Dean在WSDM09的主题演讲_构建大规模信息检索系统中的挑战(中英文)

    Jeff Dean,作为Google检索系统的架构师,他的工作和见解对这一领域的进展产生了深远影响。在WSDM09(Web搜索与数据挖掘国际会议)上的主题演讲中,Jeff分享了Google在过去十年间如何应对和解决构建大规模信息检索...

    Jeff Dean - Large Scale Deep Learning with TensorFlow

    Jeff Dean是一位在人工智能领域享有盛誉的专家,他在Google Brain团队中担任重要角色。本篇演讲主要聚焦于利用大规模数据集和计算资源推动神经网络技术的发展边界,并通过实际案例展示了Google Brain项目在过去五...

    Quote of Jeff Dean

    6. 算法设计:文档提到了Jeff Dean创建了一个O(1/n)算法,这可能是一种创新的算法设计思想,实际上并不存在这样的时间复杂度,这可能是在强调算法优化的重要性。 7. 产品维护与质量:Jeff Dean度假时,Google的生产...

    jeff dean 的讲稿

    杰夫·迪恩(Jeff Dean)的演讲稿深入探讨了构建大规模分布式系统的设计、教训与建议,这份资料堪称是IT领域内精华中的精华。作为谷歌的一位资深研究员,Jeff Dean在分布式计算领域的经验和见解对业界有着深远的影响...

    Jeff Dean:智能计算机系统的大规模深度(中文版).pdf

    Jeff Dean 是 Google.ai 的副总裁兼研究员,他在深度学习领域具有极高的权威性。他的演讲“智能计算机系统的大规模深度”(中文版)为我们带来了极为宝贵的信息。在该演讲中,Jeff Dean 分享了他对深度学习的看法,...

    2016 ScaledML会议演讲合辑:谷歌Jeff Dean讲解TensorFlow,微软陆奇解读FPGA(附PPT)

    在2016年的Scaled Machine Learning (ScaledML)会议上,两位业界巨擘——谷歌的Jeff Dean和微软的陆奇——带来了关于最新技术的精彩演讲。这次会议的重点是探讨大规模机器学习的发展与应用,特别是深度学习领域的...

    借助TensorFlow构建大规模智能深度学习系统(谷歌大牛Jeff Dean)

    Jeff Dean分享了他在谷歌工作期间使用TensorFlow框架构建大规模智能深度学习系统的经验。TensorFlow是谷歌开发的开源机器学习库,广泛用于训练和部署深度学习模型。它支持多种语言,并且可以在多种平台上部署,包括...

    Jeff-Dean-Large Scale Deep Learning

    在本段节选自Google工程师Jeff Dean与同事们合作研究的《大规模深度学习》文章中,描述了构建更智能计算机系统的复杂性以及大数据、机器学习、深度学习在其中扮演的角色。Jeff Dean是Google的高级研究员和副总裁,他...

    Jeff Dean 2013年斯坦福大学技术讲座

    杰夫·迪恩(Jeff Dean)在2013年的斯坦福大学技术讲座中深入探讨了大规模数据与计算领域面临的挑战与机遇,特别是在谷歌的计算环境背景下的分布式系统架构。讲座内容围绕着如何应对大数据中心的全球分布、硬件资源...

    Jeff-Dean-s-Lecture-for-YC-AI

    这个是Google Brain大牛Jeff dean讲课视频的ppt,讲述了目前google目前在人工智能方面的研究和进展。视频地址 https://www.youtube.com/watch?v=HcStlHGpjN8&feature=youtu.be

    interactive_latencies:杰夫·迪恩(Jeff Dean)的等待时间数随时间变化

    "interactive_latencies"项目由杰夫·迪恩(Jeff Dean)提出,他是一位知名的Google工程师,以其在分布式系统和大数据处理方面的贡献而闻名。这个项目专注于研究和可视化网络应用的交互延迟,帮助开发者更好地理解和...

    Lessons Learned While Building Infrastructure Software at Google

    标签还可能暗示文章会包含关于云计算、大数据处理和分布式系统设计的深刻见解。 文章内容部分的片段提供了关于谷歌底层系统软件设计的实例。首先,提到了分布式文件系统Google FileSystem(GFS)的设计概念。GFS是...

    2017美国NIPS大会谷歌研究院院长机器学习演讲

    在讨论了硬件加速的同时,Jeff Dean也提到了Google在机器学习系统设计方面的一些创新。这些创新包括使用AutoML技术自动化和加速深度学习模型的设计过程,以及如何更好地利用硬件资源来提升机器学习系统的性能。 ...

    Berkeley-Latency-Mar2012

    在2012年3月26日的演讲中,Jeff Dean——Google的软件架构大师、MapReduce框架的作者之一,深入探讨了如何在大型在线服务中实现快速响应时间。本演讲的核心在于解释为什么减少延迟对于提供高效的服务至关重要,并...

    WSDM09-keynote

    WSDM是关于网络搜索和数据挖掘的国际顶级会议之一,而Jeff Dean作为Google的一位资深研究员,在大规模分布式系统、搜索引擎等领域拥有深厚的技术积累和实践经验。这篇演讲聚焦于构建大规模信息检索系统的挑战。 ###...

Global site tag (gtag.js) - Google Analytics