Skype uses PostgreSQL as their backend database. PostgreSQL doesn't get enough run in the database world so I was excited to see how PostgreSQL is used "as the main DB for most of [Skype's] business needs." Their approach is to use a traditional stored procedure interface for accessing data and on top of that layer proxy servers which hash SQL requests to a set of database servers that actually carry out queries. The result is a horizontally partitioned system that they think will scale to handle 1 billion users.
•Skype's goal is an architecture that can handle 1 billion plus users. This level of scale isn't practically solvable with one really big computer, so our masked superhero horizontal scaling comes to the rescue.
•Hardware is dual or quad Opterons with SCSI RAID.
•Followed common database progression: Start with one DB. Add new databases partitioned by functionality. Replicate read-mostly data for better read access. Then horizontally partition data across multiple nodes..
•In a first for this blog anyway, Skype uses a traditional database architecture where all database access is encapsulated in stored procedures. This allows them to make behind the scenes performance tweaks without impacting frontend servers. And it fits in cleanly with their partitioning strategy using PL/Proxy.
•PL/Proxy is used to scale the OLTP portion of their system by creating a horizontally partitioned cluster:
- Database queries are routed by a proxy across a set of database servers. The proxy creates partitions based on a field value, typically a primary key.
- For example, you could partition users across a cluster by hashing based on user name. Each user is slotted into a shard based on the hash.
- Remote database calls are executed using a new PostgreSQL database language called plproxy. An example from Kristo Kaiv's blog:
First, code to insert a user in a database:CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS text AS $$BEGIN PERFORM 1 FROM users WHERE username = i_username; IF NOT FOUND THEN INSERT INTO users (username) VALUES (i_username); RETURN 'user created'; ELSE RETURN 'user already exists'; END IF;END;$$ LANGUAGE plpgsql SECURITY DEFINER;Heres the proxy code to distribute the user insert to the correct partition:queries=#CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS TEXT AS $$ CLUSTER 'queries'; RUN ON hashtext(i_username);$$ LANGUAGE plproxy;Your SQL query looks normal:SELECT insert_user("username");
- The result of a query is exactly that same as if was executed on the remote database.
- Currently they can route 1000-2000 requests/sec on Dual Opteron servers to a 16 parition cluster.
•They like PL/Proxy approach for OLTP because:
- PL/Proxy servers form a scalable and uniform "DB-bus." Proxies are robust because in a redundant configuration if one fails you can just connect to another. And if the proxy tier becomes slow you can add more proxies and load balance between them.
- More partitions can be added to improve performance.
- Only data on a failed partition is unavailable during a failover. All other partitions operate normally.
•PgBouncer is used as a connection pooler for PostgreSQL. PL/Proxy "somewhat wastes connections as it opens connection to each partition from each backend process" so the pooler helps reduce the number of connections.
•Hot-standby servers are created using WAL (Write Ahead Log) shipping. It doesn't appear that these servers can be used for read-only operations.
•More sophisticated organizations often uses an OLTP database system to handle high performance transaction needs and then create seperate systems for more non-transactional needs. For example, an OLAP (Online analytical processing) system is often used for handling complicated analysis and reporting problems. These differ in schema, indexing, etc from the OLTP system. Skype also uses seperate systems for the presentation layer of web applications, sending email, and prining invoices. This requires data be moved from the OLTP to the other systems.
- Initially Slony1 was used to move data to the other systems, but "as the complexity and loads grew Slony1 started to cause us greater and greater pains."
- To solve this problem Skype developed their on lighter weight queueing and replication toolkit called SkyTools.
The proxy approach is interesting and is an architecture we haven't seen previously. Its power comes from the make another level of indirection school of problem solving, which has advantages:
•Applications are independent of the structure of the database servers. That's encapsulated in the proxy servers.
•Applications do not need to change in response to partition, mapping, or other changes.
•Load balancing, failover, and read/write splitting are invisible to applications.
The downsides are:
•Reduced performance. Another hop is added and queries must be parsed to perform all the transparent magic.
•Inability to perform joins and other database operations across partitions.
•Added administration complexity of dealing with proxy configuration and HA for the proxy servers.
It's easy to see how the advantages can outweigh the disadvantages. Without changing your application you can slip in a proxy layer and get a lot of very cool features for what seems like a low cost. If you are a MySQL user and this approach interests you then take a look at MySQL Proxy, which accomplishes something similar in a different sort of way.
Related Articles
•An Unorthodox Approach to Database Design : The Coming of the Shard
•PostgreSQProducts - Scaling infinitely with PL/Proxy
•PL/Proxy
•Heroku also uses PostgreSQL.
•MySQL Proxy
•PostgreSQL cluster: partitioning with plproxy (part I) by Kristo Kaiv'.
•PostgreSQL cluster: partitioning with plproxy (part II) by Kristo Kaiv'.
•PostgreSQL at Skype.
•Skytools database scripting framework & PgQ by Kristo Kaiv'.
•PostgreSQL High Availability.
原文摘自: http://highscalability.com/skype-plans-postgresql-scale-1-billion-users
分享到:
相关推荐
### PostgreSQL 数据库集群与 PL/Proxy 的配置安装详解 #### 一、PostgreSQL 数据库集群概念 PostgreSQL 是一种开源的关系型数据库系统,以其强大的功能、稳定性和扩展性而受到广泛认可。对于需要处理大量数据或高...
Navicat for PostgreSQL是一套专为PostgreSQL设计的强大数据库管理及开发工具。它可以用于任何版本 7.5 或以上的 PostgreSQL 数据库服务器,并支持大部份 PostgreSQL最新版本的功能,包括触发器、函数、管理用户等。...
《Navicat for PostgreSQL 10 绿色版:高效管理PostgreSQL数据库的得力工具》 Navicat是一款强大的数据库管理和开发工具,其PostgreSQL版本专为PostgreSQL数据库服务器设计,提供了直观且用户友好的界面,使得...
navical For PostgreSQL 11 完美破解版navical For PostgreSQL 11 完美破解版navical For PostgreSQL 11 完美破解版navical For PostgreSQL 11 完美破解版 亲测有效
dbExpress is a database-independent layer that defines common interface to provide fast access to PostgreSQL from Delphi and C++Builder on Windows and Mac OS X for both 32-bit and 64-bit platforms....
navicat for postgresql navicat for postgresql 12中文免费版版下载 64位/32位
dbForge Studio 2020 for PostgreSQL 2.3.285 是一个用于数据库开发和管理的 GUI 工具。PostgreSQL 的 IDE 允许用户在方便且用户友好的界面中创建、开发和执行查询、编辑和调整代码以满足他们的要求。该工具还提供 ...
PostgreSQL for Data Architects will teach you everything you need to learn in order to get a scalable and optimized PostgreSQL server up and running. The book starts with basic concepts like ...
《Navicat 15 for PostgreSQL:连接与管理数据库的高效工具》 Navicat 15 for PostgreSQL是一款专为PostgreSQL数据库设计的强大管理和开发工具,由PremiumSoft公司开发。这款软件以其直观的用户界面、丰富的功能集...
**dbForge Studio for PostgreSQL 2.3.237** 是一款专为PostgreSQL数据库设计的强大集成开发环境(IDE)。这款工具集成了多种功能,旨在提高数据库管理、开发和优化的效率。它由Devart公司开发,是PostgreSQL开发者...
基于nacos 2.1.0 修改支持 postgresql 12.x 的容器化运行版本, 含postgresql 库初始化sql。
2017-9最新版Windows ODBC For Postgresql,已经安装并测试过,是可以正常使用的,有需要的朋友可以下载使用,特此分享。
PostgreSQL for Data Architects will teach you everything you need to learn in order to get a scalable and optimized PostgreSQL server up and running. The book starts with basic concepts (such as ...
Navicat for PostgreSQL Enterprise Edition v9.0.8是一款强大的数据库管理工具,专门设计用于连接和管理PostgreSQL数据库。此版本是英文官方安装版,包含了关键生成器(keygen),这意味着用户可以使用该软件的全部...
新版的Navicat for postgreSQL工具,以及算号器。
《Navicat for PostgreSQL 11.2:高效管理PostgreSQL数据库的利器》 Navicat for PostgreSQL 11.2是一款专为PostgreSQL数据库设计的强大管理工具,它集成了多种功能,使得数据库管理和开发变得更加便捷高效。在安装...
1. **连接性**:该组件提供了强大的连接功能,允许开发者通过ADO.NET、Entity Framework、Linq to SQL等技术与PostgreSQL数据库进行无缝通信。 2. **高性能**:采用Native ADO.NET提供高速数据访问,减少了中间层的...
【Navicat for PostgreSQL】是一款强大的数据库管理工具,专为PostgreSQL数据库设计,提供了一站式的解决方案,使得用户能够高效地进行数据管理和开发工作。它以其直观的界面、丰富的功能和出色的性能,深受广大...
**EMS SQL Manager 2007 for PostgreSQL 4.4.0.5** 是一个专为PostgreSQL数据库设计的强大管理工具,旨在提供高效且全面的数据库管理解决方案。这款软件不仅适用于数据库管理员,也适合开发人员,它提供了丰富的功能...