`
standalone
  • 浏览: 614239 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

Boost socket performance on Linux

阅读更多

 

Boost socket performance on Linux

Four ways to speed up your network applications

Level: Intermediate

M. Tim Jones (mailto:mtj@mtjones.com?subject=Boost socket performance on Linux&cc=tomyoung@us.ibm.com ), Consultant Engineer, Emulex

17 Jan 2006
Updated 03 Feb 2006

The Sockets API lets you develop client and server applications that can communicate across a local network or across the world via the Internet. Like any API, you can use the Sockets API in ways that promote high performance -- or inhibit it. This article explores four ways to use the Sockets API to squeeze the greatest performance out your application and to tune the GNU/Linux® environment to achieve the best results. Editor's note: we updated Tip 3 to correct an error in the calculation for Bandwidth Delay Product (BDP), spotted by an alert reader.
<!-- start RESERVED FOR FUTURE USE INCLUDE FILES--> <script type="text/javascript"> &lt;!-- if (document.referrer&amp;&amp;document.referrer!=&quot;&quot;) { // document.write(document.referrer); var q = document.referrer; var engine = q; var isG = engine.search(/google\.com/i); var searchTerms; //var searchTermsForDisplay; if (isG != -1) { var i = q.search(/q=/); var q2 = q.substring(i+2); var j = q2.search(/&amp;/); j = (j == -1)?q2.length:j; searchTerms = q.substring(i+2,i+2+j); if (searchTerms.length != 0) { searchQuery(searchTerms); document.write(&quot; &lt;div id=\&quot;contents\&quot;&gt;&lt;/div&gt; &quot;); } } } //--&gt; </script><!-- end RESERVED FOR FUTURE USE INCLUDE FILES-->

When developing a sockets application, job number one is usually establishing reliability and meeting the necessary requirements. With the four tips in this article, you can design and develop your sockets application for best performance, right from the beginning. This article covers use of the Sockets API, a couple of socket options that provide enhanced performance, and GNU/Linux tuning.

To develop applications with lively performance capabilities, follow these tips:

  • Minimize packet transmit latency.
  • Minimize system call overhead.
  • Adjust TCP windows for the Bandwidth Delay Product.
  • Dynamically tune the GNU/Linux TCP/IP stack.

Tip 1. Minimize packet transmit latency

When you communicate through a TCP socket, the data are chopped into blocks so that they fit within the TCP payload for the given connection. The size of TCP payload depends on several factors (such as the maximum packet size along the path), but these factors are known at connection initiation time. To achieve the best performance, the goal is to fill each packet as much as possible with the available data. When insufficient data exist to fill a payload (otherwise known as the maximum segment size , or MSS), TCP employs the Nagle algorithm to automatically concatenate small buffers into a single segment. Doing so increases the efficiency of the application and reduces overall network congestion by minimizing the number of small packets that are sent.

John Nagle's algorithm works well to minimize small packets by concatenating them into larger ones, but sometimes you simply want the ability to send small packets. A simple example is the telnet application, which allows a user to interact with a remote system, typically through a shell. If the user were required to fill a segment with typed characters before the packet was sent, the experience would be less than desirable.

Another example is the HTTP protocol. Commonly, a client browser makes a small request (an HTTP request message), resulting in a much larger response by the Web server (the Web page).

The solution

The first thing you should consider is that the Nagle algorithm fulfills a need. Because the algorithm coalesces data to try to fill a complete TCP packet segment, it does introduce some latency. But it does this with the benefit of minimizing the number of packets sent on the wire, and so it minimizes congestion on the network.

But in cases where you need to minimize that transmit latency, the Sockets API provides a solution. To disable the Nagle algorithm, you can set the TCP_NODELAY socket option, as shown in Listing 1.


Listing 1. Disabling the Nagle algorithm for a TCP socket
int sock, flag, ret;
                        /* Create new stream socket */
                        sock = socket
( AF_INET, SOCK_STREAM, 0 );
                        /* Disable the Nagle (TCP No Delay) algorithm */
                        flag = 1;
                        ret = setsockopt
( sock, IPPROTO_TCP, TCP_NODELAY, (char *)&flag, sizeof(flag) );
                        if (ret == -1) {
                        printf("Couldn't setsockopt(TCP_NODELAY)\n");
                        exit( EXIT_FAILURE );
                        }
                        

Bonus tip: Experimentation with Samba demonstrates that disabling the Nagle algorithm results in almost doubling the read performance when reading from a Samba drive on a Microsoft® Windows® server.




Tip 2. Minimize system call overhead

Whenever you read or write data to a socket, you're using a system call . This call (such as read or write ) crosses the boundary of the user space application to the kernel. Additionally, prior to getting to the kernel, your call goes through the C library to a common function in the kernel (system_call() ). From system_call() , your call gets to the filesystem layer, where the kernel determines what type of device you're dealing with. Eventually, your call gets to the sockets layer, where data are read or queued for transmission on the socket (involving a data copy).

This process illustrates that the system call operates not just in the application and kernel domains but through many levels within each domain. The process is expensive, so the more calls you make, the more time you spend working through this call chain, and the less performance you get from your application.

Because you can't avoid making these system calls, your only option is to minimize the number of times you do it. Fortunately, you have control over this process.

The solution

When writing data to a socket, write all the data that you have available instead of performing multiple writes of the data. For reads, pass in the largest buffer that you can support since the kernel will try to fill the entire buffer if enough data exist (in addition to keeping TCP's advertised window open). In this way, you can minimize the number of calls you make and achieve better overall performance. The sendfile system call is also useful for large data transfers, but the TCP_CORK socket option should be set in this case. The writev system call can also be used for bulk transfer as well as the asynchronous IO API (aio_read , aio_write , etc.).




Tip 3. Adjust TCP windows for the Bandwidth Delay Product

TCP depends on several factors for performance. Two of the most important are the link bandwidth (the rate at which packets can be transmitted on the network) and the round-trip time , or RTT (the delay between a segment being sent and its acknowledgment from the peer). These two values determine what is called the Bandwidth Delay Product (BDP).

Given the link bandwidth rate and the RTT, you can calculate the BDP, but what does this do for you? It turns out that the BDP gives you an easy way to calculate the theoretical optimal TCP socket buffer sizes (which hold both the queued data awaiting transmission and queued data awaiting receipt by the application). If the buffer is too small, the TCP window cannot fully open, and this limits performance. If it's too large, precious memory resources can be wasted. If you set the buffer just right, you can fully utilize the available bandwidth. Let's look at an example:

BDP = link_bandwidth * RTT

If your application communicates over a 100Mbps local area network with a 50 ms RTT, the BDP is:

100MBps * 0.050 sec / 8 = 0.625MB = 625KB

Note: I divide by 8 to convert from bits to bytes communicated.

So, set your TCP window to the BDP, or 625KB. But the default window for TCP on Linux 2.6 is 110KB, which limits your bandwidth for the connection to 2.2MBps, as I've calculated here:

throughput = window_size / RTT

110KB / 0.050 = 2.2MBps

If instead you use the window size calculated above, you get a whopping 12.5MBps, as shown here:

625KB / 0.050 = 12.5MBps

That's quite a difference and will provide greater throughput for your socket. So you now know how to calculate the optimal socket buffer size for your socket. But how do you make this change?

The solution

The Sockets API provides several socket options, two of which exist to change the socket send and receive buffer sizes. Listing 2 shows how to adjust the size of the socket send and receive buffers with the SO_SNDBUF and SO_RCVBUF options.

Note: Although the socket buffer size determines the size of the advertised TCP window, TCP also maintains a congestion window within the advertised window. Therefore, because of congestion, a given socket may never utilize the maximum advertised window.


Listing 2. Manually setting the send and receive socket buffer sizes
int ret, sock, sock_buf_size;
                        sock = socket
( AF_INET, SOCK_STREAM, 0 );
                        sock_buf_size = BDP;
                        ret = setsockopt
( sock, SOL_SOCKET, SO_SNDBUF,
                        (char *)&sock_buf_size, sizeof(sock_buf_size) );
                        ret = setsockopt
( sock, SOL_SOCKET, SO_RCVBUF,
                        (char *)&sock_buf_size, sizeof(sock_buf_size) );
                        

Within the Linux 2.6 kernel, the window size for the send buffer is taken as defined by the user in the call, but the receive buffer is doubled automatically. You can verify the size of each buffer using the getsockopt call.

Jumbo frames

Also consider increasing the packet size from 1,500 to 9,000 bytes (known as a jumbo frame). This can be done in local network situations by setting the Maximum Transmit Unit (or MTU) and can really boost performance. While great for LANs, it can sometimes be problematic in WANs because intermediary equipment such as switches may not support it. The MTU can be modified using the ifconfig utility.

As for window scaling, TCP originally supported a maximum 64KB window (16 bits were used to define the window size). With the inclusion of window scaling (per RFC 1323), you can use a 32-bit value to represent the size of the window. The TCP/IP stack provided in GNU/Linux supports this option (and many others).

Bonus tip: The Linux kernel also includes the ability to auto-tune these socket buffers (see tcp_rmem and tcp_wmem in Table 1 below), but these options affect the entire stack. If you need to adjust the window for only one connection or type of connection, this mechanism does what you need.

Tip 4. Dynamically tune the GNU/Linux TCP/IP stack

A standard GNU/Linux distribution tries to optimize for a wide range of deployments. This means that the standard distribution might not be optimal for your environment.

The solution

GNU/Linux provides a wide range of tunable kernel parameters that you can use to dynamically tailor the operating system for your specific use. Let's look at some of the more important options that affect sockets performance.

The tunable kernel parameters exist within the /proc virtual filesystem. Each file in this filesystem represents one or more parameters that can be read through the cat utility or modified with the echo command. Listing 3 shows how to query and enable a tunable parameter (in this case, enabling IP forwarding within the TCP/IP stack).


Listing 3. Tuning: Enable IP forwarding within the TCP/IP stack
[root@camus]# cat /proc/sys/net/ipv4/ip_forward
                        0
                        [root@camus]# echo "1" > /proc/sys/net/ipv4/ip_forward
                        [root@camus]# cat /proc/sys/net/ipv4/ip_forward
                        1
                        [root@camus]#
                        

Table 1 is a list of several tunable parameters that can help you increase the performance of the Linux TCP/IP stack.

Table 1. Kernel tunable parameters for TCP/IP stack performance Tunable parameter Default value Option description
/proc/sys/net/core/rmem_default "110592" Defines the default receive window size; for a large BDP, the size should be larger.
/proc/sys/net/core/rmem_max "110592" Defines the maximum receive window size; for a large BDP, the size should be larger.
/proc/sys/net/core/wmem_default "110592" Defines the default send window size; for a large BDP, the size should be larger.
/proc/sys/net/core/wmem_max "110592" Defines the maximum send window size; for a large BDP, the size should be larger.
/proc/sys/net/ipv4/tcp_window_scaling "1" Enables window scaling as defined by RFC 1323; must be enabled to support windows larger than 64KB.
/proc/sys/net/ipv4/tcp_sack "1" Enables selective acknowledgment, which improves performance by selectively acknowledging packets received out of order (causing the sender to retransmit only the missing segments); should be enabled (for wide area network communication), but it can increase CPU utilization.
/proc/sys/net/ipv4/tcp_fack "1" Enables Forward Acknowledgment, which operates with Selective Acknowledgment (SACK) to reduce congestion; should be enabled.
/proc/sys/net/ipv4/tcp_timestamps "1" Enables calculation of RTT in a more accurate way (see RFC 1323) than the retransmission timeout; should be enabled for performance.
/proc/sys/net/ipv4/tcp_mem "24576 32768 49152" Determines how the TCP stack should behave for memory usage; each count is in memory pages (typically 4KB). The first value is the low threshold for memory usage. The second value is the threshold for a memory pressure mode to begin to apply pressure to buffer usage. The third value is the maximum threshold. At this level, packets can be dropped to reduce memory usage. Increase the count for large BDP (but remember, it's memory pages, not bytes).
/proc/sys/net/ipv4/tcp_wmem "4096 16384 131072" Defines per-socket memory usage for auto-tuning. The first value is the minimum number of bytes allocated for the socket's send buffer. The second value is the default (overridden by wmem_default ) to which the buffer can grow under non-heavy system loads. The third value is the maximum send buffer space (overridden by wmem_max ).
/proc/sys/net/ipv4/tcp_rmem "4096 87380 174760" Same as tcp_wmem except that it refers to receive buffers for auto-tuning.
/proc/sys/net/ipv4/tcp_low_latency "0" Allows the TCP/IP stack to give deference to low latency over higher throughput; should be disabled.
/proc/sys/net/ipv4/tcp_westwood "0" Enables a sender-side congestion control algorithm that maintains estimates of throughput and tries to optimize the overall utilization of bandwidth; should be enabled for WAN communication. This option is also useful for wireless interfaces, as packet loss may not be caused by congestion.
/proc/sys/net/ipv4/tcp_bic "1" Enables Binary Increase Congestion for fast long-distance networks; permits better utilization of links operating at gigabit speeds; should be enabled for WAN communication.

As with any tuning effort, the best approach is experimental in nature. Your application behavior, processor speed, and availability of memory all affect how these parameters will alter performance. In some cases, what you think should be beneficial can be detrimental (and vice versa). So, try an option and then check the result. In other words, trust but verify.

Bonus tip: A word about persistent configuration. Note that if you reboot a GNU/Linux system, any tunable kernel parameters that you changed revert to their default. To make yours the default parameter, use the file /etc/sysctl.conf to configure the parameters at boot-time for your configuration.




GNU/Linux tools

GNU/Linux is attractive to me because of the number of tools that are available. The vast majority are command-line tools, but they are amazingly useful and intuitive. GNU/Linux provides several tools -- either natively or available as open source -- to debug networking applications, measure bandwidth/throughput, and check link utilization.

Table 2 lists some of the most useful GNU/Linux tools along with their intended use. Table 3 lists useful tools that are not typically part of GNU/Linux distributions.

Table 2. Native tools commonly found in any GNU/Linux distribution GNU/Linux utility Purpose
ping Most commonly used to check accessibility to a host but can also be used to identify the RTT for the bandwidth-delay-product calculation.
traceroute Prints the path (route) for a connection to a network host through a series of routers and gateways, identifying the latency between each hop.
netstat Identifies various statistics about the networking subsystem, protocols, and connections.
tcpdump Shows the protocol-level packet trace for one or more connections; also includes timing information, which you can use to explore the packet timing of the various protocol services.

Table 3. Useful performance tools not typically available in a GNU/Linux distribution GNU/Linux utility Purpose
netlog Provides application instrumentation for network performance.
nettimer Generates a metric for bottleneck link bandwidth; can be used for protocol auto-tuning.
Ethereal Provides the features of tcpump (packet trace) in an easy-to-use graphical interface.
iperf Measures network performance for both TCP and UDP; measures maximum bandwidth, and also reports delay jitter and datagram loss.
trafshow Provides full-screen visualization of network traffic.




Conclusion

Experiment with these tips and techniques to increase the performance of your sockets applications, including reducing transmit latency by disabling the Nagle algorithm, increasing bandwidth utilization of a socket through buffer sizing, reducing system call overhead by minimizing the number of system calls, and tuning the Linux TCP/IP stack with tunable kernel parameters.

Always consider the nature of your application when tuning. For example, is your application LAN-based or will it communicate over the Internet? If your application operates only within a LAN, increasing socket-buffer sizes may not yield much benefit, but enabling jumbo frames certainly will!

Finally, always check the results of your tuning with a tool like tcpdump or Ethereal . The changes you see at the packet level will help indicate the success of your tuning with these techniques.

分享到:
评论

相关推荐

    boost库1.68版本Linux下编译的动态库和静态库

    在Linux环境下,Boost库可以用于开发各种类型的应用程序,包括服务器端软件、桌面应用以及嵌入式系统。 编译Boost 1.68版本的库主要涉及以下几个步骤: 1. **下载与解压**:首先从Boost官方仓库...

    boost aiso socket 通信实例

    使用boost aiso 实现的异步和同步socket通信demo。 注:运行程序前,请先安装和配置好boost库。配置方法谷歌或百度都能找到。 boost库下载:https://sourceforge.net/projects/boost/files/boost-binaries/

    使用boost+qt在linux系统编写的串口通讯

    使用boost+qt在linux系统编写的串口通讯

    boost linux 1.5.7

    SourceForge遭屏蔽。无法下载boost库。特地上传此库。

    boostnote0.8.10 for linux

    boostnote0.8.10 for linux

    跨平台的socket库,windows-linux-socket

    本文将深入探讨“跨平台的socket库,windows-linux-socket”这一主题,旨在帮助开发者理解如何在Windows和Linux操作系统之间实现兼容的Socket编程。 首先,Socket是操作系统提供的一种接口,用于在网络中进行进程间...

    Boost socket通信序列化

    解决socket网络中通信的序列化和反序列化问题,很好的实例

    boost-note-linux.deb

    BoostNote Linux Install package, 开源的多平台Markdown编辑器,目前因为官网下载太慢了,一度失去了使用的兴趣,但是还是禁不住诱惑,实在太好用了,分享给大家,希望大家喜欢。

    Linux下C++ Socket编程实例

    本教程将深入探讨如何在Linux环境下使用C++进行Socket编程,通过分析提供的`SocketServer.cpp`和`SocketClient.cpp`两个文件,我们可以学习到C++与Linux Socket API的交互。 首先,让我们了解Socket的基本概念。...

    基于boost.asio库的linux C++ https/ssl server client 含完整boost和openssl库 编译环境 测试证书

    基于官网的boost.asio的HTTPS/SSL例子,包括server和client端,完整的Linux编译环境,包含了完整的boost库和openssl库,编译配置测试说明文档,测试用的server.crt和server.key证书文件。如需windows版本,和自己...

    Linux下使用C++进行Socket编程

    在Linux系统中,C++与Socket编程的结合是构建网络应用程序的基础。Socket接口提供了一种标准的方法,使得运行在不同主机上的进程可以通过网络进行通信。本文将深入探讨如何在Linux环境下利用C++进行Socket编程。 ...

    Linux安装boost 1.55.0安装包和文档

    在Linux系统中,Boost库是一个极其重要的开源C++库集合,它提供了许多高效且跨平台的工具,用于提升C++的编程效率。本教程将详细讲解如何在Linux上安装Boost 1.55.0版本及其相关的文档。 首先,我们需要从官方网站...

    Linux boost库安装、编译问题小记

    环境: Linux s12084 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:58:04 EST 2007 i686 i686 i386 GNU/Linux  gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-47.3)  boost 1.37.0  去年10月份用过一次再没用过了。...

    C++ High Performance Boost and optimize the performance of your C++17 zip

    C++ High Performance Boost and optimize the performance of your C++17 code_Code 源码 本资源转载自网络,如有侵权,请联系上传者或csdn删除 查看此书详细信息请在美国亚马逊官网搜索此书

    Linux下boost库的安装

    在Linux环境下安装Boost库是一个在C++软件开发中常见的任务,尤其是在需要高效跨平台编程时。Boost库是一组由C++标准库前成员发起的公共开放源代码库,包含了一系列功能强大的模板库。安装Boost库在Linux系统中可以...

    Linux socket 编程入门

    Linux Socket编程是网络编程的重要组成部分,它允许程序员创建和使用网络连接,特别是在TCP/IP协议栈上构建应用程序。本文将深入探讨TCP服务器端的Socket编程,为初学者提供基础的指导。 首先,理解Socket编程的...

    C++和BOOST实现CNN的Linux版本

    在本文中,我们将深入探讨如何使用C++编程语言和BOOST库在Linux环境下实现卷积神经网络(CNN)。C++是一种高效、通用的编程语言,常用于高性能计算和系统级编程,而BOOST则是一个广泛使用的C++库,提供了许多实用...

Global site tag (gtag.js) - Google Analytics