`
masterkey
  • 浏览: 338707 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

MySpace Architecture

阅读更多

Information Sources

  • Presentation: Behind the Scenes at MySpace.com
  • Inside MySpace.com

    Platform

  • ASP.NET 2.0
  • Windows
  • IIS
  • SQL Server

    What's Inside?

  • 300 million users.
  • Pushes 100 gigabits/second to the internet. 10Gb/sec is HTML content.
  • 4,500+ web servers windows 2003/IIS 6.0/APS.NET.
  • 1,200+ cache servers running 64-bit Windows 2003. 16GB of objects cached in RAM.
  • 500+ database servers running 64-bit Windows and SQL Server 2005.
  • MySpace processes 1.5 Billion page views per day and handles 2.3 million concurrent users during the day
  • Membership Milestones:
    - 500,000 Users: A Simple Architecture Stumbles
    - 1 Million Users:Vertical Partitioning Solves Scalability Woes
    - 3 Million Users: Scale-Out Wins Over Scale-Up
    - 9 Million Users: Site Migrates to ASP.NET, Adds Virtual Storage
    - 26 Million Users: MySpace Embraces 64-Bit Technology
  • 500,000 accounts was too much load for two web servers and a single database.
  • At 1-2 Million Accounts
    - They used a database architecture built around the concept of vertical partitioning, with separate databases for parts of the website that served different functions such as the log-in screen, user profiles and blogs.
    - The vertical partitioning scheme helped divide up the workload for database reads and writes alike, and when users demanded a new feature, MySpace would put a new database online to support it.
    - MySpace switched from using storage devices directly attached to its database servers to a storage area network (SAN ), in which a pool of disk storage devices are tied together by a high-speed, specialized network, and the databases connect to the SAN. The change to a SAN boosted performance, uptime and reliability.
  • At 3 Million Accounts
    - the vertical partitioning solution didn't last because they replicated some horizontal information like user accounts across all vertical slices. With so many replications one would fail and slow down the system.
    - individual applications like blogs on sub-sections of the Web site would grow too large for a single database server
    - Reorganized all the core data to be logically organized into one database
    - split its user base into chunks of 1 million accounts and put all the data keyed to those accounts in a separate instance of SQL Server
  • 9 Million–17 Million Accounts
    - Moved to ASP.NET which used less resources than their previous architecture. 150 servers running the new code were able to do the same work that had previously required 246.
    - Saw storage bottlenecks again. Implementing a SAN had solved some early performance problems, but now the Web site's demands were starting to periodically overwhelm the SAN's I/O capacity—the speed with which it could read and write data to and from disk storage.
    - Hit limits with the 1 million-accounts-per-database division approach as these limits were exceeded.
    - Moved to a virtualized storage architecture where the entire SAN is treated as one big pool of storage capacity , without requiring that specific disks be dedicated to serving specific applications. MySpace now standardized on equipment from a relatively new SAN vendor, 3PARdata
  • Added a caching tier—a layer of servers placed between the Web servers and the database servers whose sole job was to capture copies of frequently accessed data objects in memory and serve them to the Web application without the need for a database lookup.
  • 26 Million Accounts
    - Moved to 64-bit SQL server to work around their memory bottleneck issues. Their standard database server configuration uses 64 GB of RAM.
  • Horizontally Federated Database . Databases are partition by purpose. Have profile, email databases etc. Partition is based on user range. 1 Million users live in each database. So you have Profile1, Profile2 all the way up to Profile300 as they have 300 million users.
  • Doesn't use ASP cache because they don't have a high enough hit rate on the front-end. The middle tier cache does have a high hit rate.
  • Failure isolation . Segment requests into web server by database. Allow only 7 threads per database. So if the database is slow only those threads will slowdown and the traffic in the other threads will flow.

    Operations

  • PerfCollector . Centralized collection of performance data via UDP. More reliable than Windows and allows any client to connect and see stats.
  • Web Based Stack Dump Tool . Can right-click on a problem server and get stack dump of the .Net managed threads. Used to have to RDC into system and attach a debugger and 1/2 later get an answer. Slow, nonscalable, and tedious. Not just a stack dump, gives a lot of context about what the thread is doing. Troubleshooting is easier because you can see 90 threads are blocked on a database so the database may be down.
  • Web Base Heap Dump Tool . Dumps all memory allocations. Very useful for developers. Save hours of doing it by hand.
  • Profiler . Traces a request from start to finish and produces a report. See URL, methods, status, everything that will help you identify a slow request. Looks at lock contentions, are a lot of exceptions being thrown, anything that might be interesting. Very light weight. It's running on one box in every VIP (group of 100 servers) in production. Samples 1 thread every 10 seconds. Always tracing in background.
  • Powershell . Microsoft's new shell that runs in process and pass objects between commands versus parsing text output. MySpace develops a lot of commandlets to support operations.
  • Developed their own asynchronous communication technology to get around windows networking problems and treat servers as a group. Can ship a .cs file, compile it, run it, and ship the response back.
  • Codespew . Pushes code updates on their communication technology. Used to do 5 code pushes a day, now down to 1 a week.

    Lessons Learned

  • You can build big websites using Microsoft tech.
  • A cache should have been used from the beginning.
  • The cache is a better place to store transitory data that doesn't need to be recorded in a database, such as temporary files created to track a particular user's session on the Web site.
  • Built in OS features to detect denial of service attacks can cause inexplicable failures.
  • Distribute your data to geographically diverse data centers to handle power failures.
  • Consider using virtualized storage/clustered file systems from the start. It allows you to massively parallelize IO access while being able to add disk as needed without any reorganization needed.
  • Develop tools that work in a production environment. Can't simulate everything in test environment. The scale and variety of uses APIs are put to can't be simulated in QA during testing. Legitimate users and hackers will run into corner cases that weren't hit in testing, though QA will find most of the problems.
  • Throw hardware at problems. Easier than changing their backend software to a new way of doing things. The example is they add a new database server for every million users. It might be more efficient to change their approach to more efficiently use the database hardware, but it's easier just to add servers. For now.
  • 分享到:
    评论

    相关推荐

      从MySpace的六次重构经历,来认识分布式系统到底该如何创建MySpace技术架构

      从MySpace的六次重构经历,来认识分布式系统到底该如何创建MySpace技术架构

      从MySpace的六次重构经历,来认识分布式系统到底该如何创建

      ### 从MySpace的六次重构经历探讨分布式系统的构建之道 #### 一、引言 在互联网技术迅猛发展的今天,构建能够应对大规模用户流量及数据处理的分布式系统已成为诸多在线平台面临的挑战。本文将以社交网站MySpace的...

      Dan Farino谈MySpace架构

      《Dan Farino谈MySpace架构:大规模在线社区的技术挑战与解决方案》 MySpace,作为曾经全球最大的社交网络之一,其背后的系统架构和技术解决方案一直是IT行业内的热门话题。在InfoQ的访谈中,MySpace首席系统架构师...

      myspace个人空间4.8.0.3免费绿色版

      “个人空间(myspace)”为“私人空间(privatespace)”的精简升级版,去掉了一些不常用的增强功能。 个人空间软件力求界面的简洁、功能的简便。加、解密的操作完全和系统融为一体,成为系统的一部分。在软件大大“减肥...

      一个myspace的爬网程序

      【标题】: "一个myspace的爬网程序" 在IT领域,网络爬虫是一种用于自动抓取互联网数据的工具,而“一个myspace的爬网程序”是指专门针对myspace平台设计的一种爬虫软件。Myspace曾是全球最大的社交网络之一,允许...

      myspace_getuser

      【myspace_getuser】是一个工具或程序,主要用于从社交网络平台MySpace中提取用户信息。这个工具可能由编程爱好者或数据分析者开发,用于批量获取MySpace用户的特定数据,例如用户名、个人资料、好友列表等。在描述...

      社交网MySpace年内将重出江湖.docx

      【社交网MySpace年内将重出江湖】 昔日的社交网络巨头MySpace正准备卷土重来,但这次它的战略发生了重大转变,将重点放在音乐领域,以期重新树立其在音乐服务领域的领导地位。在被新闻集团出售之前,MySpace已经...

      大型网站架构之_MySpace的体系架构.doc

      MySpace的体系架构展示了大型网站如何应对海量用户和高并发挑战。这个社交网络平台在初期遇到了性能问题,但通过不断的技术迭代和优化,逐步解决了这些问题。以下是从文档中提炼出的关键知识点: 1. **垂直分割与...

      myspace笔记本软件

      "myspace笔记本软件"这个标题提到了“myspace”和“笔记本软件”。myspace曾是一个流行的社交网络平台,但在这里可能是指一个特定的笔记本管理软件或者与myspace相关的个人知识管理工具。笔记本软件通常指的是帮助...

      MySpace坠落的五点教训:保持专注 保持创新.docx

      MySpace,这个曾经拥有庞大用户群的社交网络平台,它的衰败已成为一个引人深思的案例。随着网络社交的蓬勃发展,人们常常会问:MySpace究竟为什么失败?它的坠落给后世留下了哪些教训?本文将围绕五个关键点展开分析...

      MYSpace架构设计

      ### MYSpace架构设计 #### 一、架构概述与历史 MYSpace作为一个知名的社交平台,在其发展历程中经历了从无到有、从小到大的过程。本文档将介绍MYSpace的空间建设思路及其技术架构的发展历程。 #### 二、初始阶段...

      myspace-93-leaderboard:位于https:myspace.windows93.net的MySpace克隆的排行榜

      “myspace-api”可能意味着项目使用或实现了类似MySpace的API接口,尽管原始MySpace可能不再提供API服务;“Python”是实现这个排行榜应用的编程语言,表明项目代码是以Python编写的。 【压缩包子文件的文件名称...

      MySpace 账簿管理系统 v1.02 Beta 源码版

      MySpace(我的空间)是一款简单实用的账簿管理系统,采用.NET+ACCESS开发,简单三层结构,包含了账簿管理、网址收藏、备忘管理、密码箱、类别管理等功能。 MySpace 账簿管理系统功能描述: 一:内容管理 1:...

      MySpace 账簿管理系统 v1.02 Beta 发布版

      MySpace(我的空间)是一款简单实用的账簿管理系统,采用.NET+ACCESS开发,简单三层结构,包含了账簿管理、网址收藏、备忘管理、密码箱、类别管理等功能。MySpace 账簿管理系统功能描述:一:内容管理1:账簿管理 记账...

      MySpace

      【MySpace】是一个曾经非常流行的社交媒体平台,它在2000年代初期引领了社交网络的潮流,一度超越Google成为美国访问量最大的网站。MySpace的兴起预示着互联网社交时代的来临,为后来的Facebook、Twitter等平台奠定...

      2020——收藏资料活动Myspace友你友我友音乐20072018120520200.ppt

      2020——收藏资料活动Myspace友你友我友音乐20072018120520200.ppt

      MySpace 账簿管理系统 v1.02 Beta 发布版.rar

      MySpace(我的空间)是一款简单实用的账簿管理系统,采用.NET ACCESS开发,简单三层结构,包含了账簿管理、网址收藏、备忘管理、密码箱、类别管理等功能。 MySpace 账簿管理系统功能描述: 一:内容管理 1:账簿...

    Global site tag (gtag.js) - Google Analytics