- 浏览: 855763 次
- 性别:
- 来自: 北京
文章分类
最新评论
-
zjhzwx1212:
为什么用threadLocal后,输出值是从20开始的,而定义 ...
j2ee的线程安全--threadlocal -
aeoluspu:
不错 mysql 测试部分感觉不详细
用sysbench(或者super-smack)测试mysql性能 -
nanPrivate:
有没有例子,只理论,实践起来还是不会啊
JMS可靠消息传送 -
lwclover:
一个网络工程师 装什么b
postfix 如何删除队列中的邮件 -
maimode:
我也欠缺不少啊
理想的计算机科学知识体系
http://www.xasun.com/article/2a/1991$3.html
1.With the Xeon 5500 series processors
Intel has diverged from its traditional Symmetric Multiprocessing (SMP) architecture
to a Non-Uniform Memory Access (NUMA) architecture.In a two-processor scenario, the Xeon
5500 series processors are connected through a serial coherency link called QuickPath
Interconnect (QPI). The QPI is capable of 6.4, 5.6 or 4.8 GT/s (gigatransfers per second),
depending on the processor model. The Xeon 5500 series integrates the memory controller within
the processor, resulting in two memory controllers in a two-socket system. Each memory
controller has three memory channels and supports DDR-3 memory. Depending on processor
model, the type of memory used, and the population of memory, memory may be clocked at
1333MHz, 1066MHz or 800MHz. Each memory channel supports up to 3 DIMMs per channel
(DPC), for a theoretical maximum of 9 DIMMs per processor or 18 per 2-socket server. (See
Figure 1 for illustration.) However, the actual maximum number of DIMMs per system is
dependent upon the system design.
新的55系列的至强CPU已经由原来的SMP结构改成了现在的NUMA结构,两个CPU不再对共同的内存资源
管理,而是把内存控制器集成到CPU中,每个CPU可以管理3个通道一共9条内存,CPU之间通过QPI(可以理解为内部总线)互联。而内存使用时能达到的最高频率跟CPU本身和DIMM都有关系。
2.Memory Performance
With the varied number of configurations possible in the Xeon 5500 series processor-based
systems, a number of variables emerge that influence processor/memory performance. The main
variables are memory speed, memory interleaving, memory ranks and memory population across
various memory channels and processors. Depending on the processor model and number of
DIMMs, the performance of the Xeon 5500 platform will see large memory performance
variances. We will look at each of these factors more closely in the next sections.
与内存性能最相关的包括CPU的类型,每通道安装的内存数,内存本身的性能,内存互联的方式,内存的RANK数等等。
2.1 Memory Speed
As mentioned earlier, the memory speed is determined by the combination of the processor
model, DIMM speed, and DIMMs per channel.
2.1.1 Processor model
The initial Xeon 5500 series processor-based offerings will be categorized into 3 bins called
Performance, Volume and Value. The 3 bins have the ability to clock memory at different
maximum speeds:
• 1333MHz (X55xx processor models)
• 1066MHz (E552x or L552x and up)
• 800MHz (E550x)
So, the processor model will limit the maximum frequency of the memory. Note: Because of the
integrated memory controllers the former front-side bus (FSB) no longer exists.
内存控制器集成到CPU中后,FSB就不存在的(没有前端总线的概念,和AMD的处理器一致)。
2.1.2 DDR3 DIMM Speed
DDR-3 memory will be available in various sizes at speeds of 1333MHz and 1066MHz. 1333MHz
represents the maximum capability at which memory can be clocked. However, the memory will
not be clocked faster than the capability of the processor model and will be clocked appropriately
by the BIOS.
2.1.3 DIMMs per Channel (DPC)
The number and type of DIMMs and the channels in which they reside will also determine the
speed at which memory will be clocked. Table 1 describes the behavior of the platform. The table
below assumes a 1333MHz-capable processor model (X55xx). If a slower processor model is
used, then the memory speed will be the lower of the memory speed and the processor model
memory speed capability. If the DPC is not uniform across all the channels, then the system will
clock to the frequency of the slowest channel.
每个通道使用不同数目的内存时,内存工作的频率是不一样的,具体见下表。
表1
2.1.4 Low-level Performance Specifics
It is important to understand the impact of the performance of the Xeon 5500 series platform,
depending on the memory speed. We will use both low-level memory tools and application
benchmarks to quantify the impact of memory speed.
关系内存性能的参数:延迟和吞吐量
Two of the key low-level metrics that are used to measure memory performance are memory
latency and memory throughput. We use a base Xeon 5500 2.93GHz, 1333MHz-capable 2-
socket system for this analysis. The memory configurations for the three memory speeds in the
following benchmarks are as follows:
• 1333MHz – 6 x 4GB dual-rank 1333MHz DIMMs
• 1066MHz – 12 x 2GB dual-rank DIMMs for 1066MHz
• 800MHz – 12 x 2GB dual-rank DIMMs clocked down to 800MHz in BIOS
Note: Memory ranks are explained in detail in section 3.3.
As shown in 表2 below, we show the unloaded latency to local memory. The unloaded
latency is measured at the application level and is designed to defeat processor prefetch
mechanisms. As shown in the 表2, the difference between the fastest and slowest speeds is
about 10%. This represents the high watermark for latency-sensitive workloads. Another
important thing to note is that this is almost a 50% decrease in memory latency when compared
to the previous generation Xeon 5400 series processor on 5000P chipset platforms.
内存延迟:1333对1066MHZ内存的提升在10%左右,但是55系列CPU对于54系列CPU总体上有50%的提升。
表2
A better indicator of application performance is memory throughput. We use the triad component
of the streams benchmark to compare the performance at different memory speeds. The memory
throughput assumes all local memory allocation and all 8 cores utilizing main memory. As shown
in 表3, the performance gain from running memory at 1066MHz versus 800MHz is 28%, and
the performance gain from running at 1333MHz versus 1066MHz is 9%. So, the performance
penalty of clocking memory down to 800MHz is far greater than clocking it down to 1066MHz.
This new processor design comes with some trade-offs in memory capacity, performance, and
cost: For example, more lower-cost/lower-capacity DIMMs mean lower memory speed.
Alternatively, fewer higher-capacity DIMMs cost more but offer higher performance.
注意,内存频率从1333降到1066比从1066降到800损失要小。
表3
Regardless of memory speed, the Xeon 5500 platform represents a significant improvement in
memory bandwidth over the previous Xeon 5400 platform. At 1333MHz, the improvement is
almost 500% over the previous generation. This huge improvement is mainly due to dual
integrated memory controllers and faster DDR-3 1333MHz memory. This improvement translates
into improved application performance and scalability.
至强55系列CPU比之前的54系列CPU的内存带宽提高了将近500%
2.1.5 Application Performance
In this section, we will discuss the impact of memory speed on the performance of three
commonly used benchmarks: SPECint®2006_rate, SPECfp®2006_rate and SPECjbb®2005. In
each case, the benchmark scores are relative to the score at 800MHz as shown in Figure 8.
SPECint2006_rate is typically used as an indicator of performance for commercial applications. It
tends to be more sensitive to processor frequency and less to memory bandwidth. There are very
few components in SPECint2006_rate that are memory bandwidth intensive and so the
performance gain with memory speed improvements is the least for this workload. In fact, most of
the difference observed is due to one of the sub-benchmarks that shows a high sensitivity to
memory frequency. There is an 8% improvement going from 800MHz to 1333MHz while the
improvement in memory bandwidth is almost 40%.
SPECfp_rate is used as an indicator for HPC (high-performance computing) workloads. It tends
to be memory bandwidth intensive and should reveal significant improvements for this workload
as memory frequency increases. As expected, a number of sub-benchmarks demonstrate
improvements as high as the difference in memory bandwidth. As shown in Figure 8, there is a
13% gain going from 800MHz to 1066MHz and another 6% improvement with 1333MHz.
SPECfp_rate captures almost 50% of the memory bandwidth improvement.
SPECjbb2005 is a workload that does not stress memory but keeps the data bus moderately
utilized. This workload provides a middle ground and the performance gains reflect that trend. As
shown in 表4, there is an 8% gain from 800MHz to 1066MHz and another 2% upside with
1333MHz.
表4
2.2 Memory Interleaving
Memory interleaving refers to how physical memory is interleaved across the physical DIMMs. A
balanced system provides the best interleaving. A Xeon 5500 series processor-based system is
balanced when all memory channels on a socket have the same amount of memory. The
simplest way to enforce optimal interleaving is by populating 6 identical DIMMs at 1333MHz, 12
identical DIMMs at 1066MHz and 18 identical DIMMs (where supported by platform) at 800MHz.
This leads to lessened performance. Figure 9 shows the impact of reduced interleaving. The first
configuration is a balanced baseline configuration where the memory is down-clocked to 800MHz
in BIOS. The second configuration populates four channels with 50% more memory than two
other channels causing an unbalanced configuration. The third configuration balances the
memory on all channels by populating the channels with fewer DIMM slots with a DIMM that is
double the capacity of others. (For example, two channels with 3 x 4GB DIMMs and one channel
with 1 x 4GB and 1 x 8GB DIMMs.) This ensures that all channels have the same capacity. As
表6 shows, the first and third balanced configurations significantly outperform the
unbalanced configuration. Depending on the memory footprint of the application and memory
access pattern, the impact could be higher or lower than the two applications cited in the figure.
注意,内存越多,内存的工作频率越低,12DIMMS工作在1066MHZ,18DIMMS工作在800MHZ,具体请看表7.
表6,表7
2.3 Memory Ranks
A memory rank is simply a segment of memory that is addressed by a specific address bit.
DIMMs typically have 1, 2 or 4 memory ranks, as indicated by their size designation.
• A typical memory DIMM description: 2GB 4R x8 DIMM
• The 4R designator is the rank count for this particular DIMM (R for rank = 4)
• The x8 designator is the data width of the rank
It is important to ensure that DIMMs with the appropriate number of ranks are populated in each
channel for optimal performance. Whenever possible, it is recommended to use dual-rank DIMMs
in the system. Dual-rank DIMMs offer better interleaving and hence better performance than
single-rank DIMMs. For instance, a system populated with 6 x 2GB dual-rank DIMMs outperforms
a system populated with 6 x 2GB single-rank DIMMs by 7% for SPECjbb2005. Dual-rank DIMMs
are also better than quad-rank DIMMs because quad-rank DIMMs will cause the memory speed
to be down-clocked.
Another important guideline is to populate equivalent ranks per channel. For instance, mixing
single-rank and dual-rank DIMMs in a channel should be avoided.
RANK指的是内存的生产工艺,每个通道可以支持的RANK总数是有限的,实际应用的时候应该保证内存大小与内存频率上的平衡。往往推荐使用双RANK的内存。
2.4 Memory Population across Memory Channels
It is important to ensure that all three memory channels in each processor are populated. The
relative memory bandwidth is shown in Figure 10, which illustrates the loss of memory bandwidth
as the number of channels populated decreases. This is because the bandwidth of all the
memory channels is utilized to support the capability of the processor. So, as the channels are
decreased, the burden to support the requisite bandwidth is increased on the remaining channels,
causing them to become a bottleneck.
表8
2.5 Memory Population Across Processor Sockets
Because the Xeon 5500 series uses NUMA architecture, it is important to ensure that both
memory controllers in the system are utilized, by providing both processors with memory. If only
one processor is installed, only the associated DIMM slots can be used. Adding a second
processor not only doubles the amount of memory available for use, but also doubles the number
of memory controllers, thus doubling the system memory bandwidth. It is also optimal to populate
memory for both processors in an identical fashion to provide a balanced system. Using Figure
11 as an example, Processor 0 has DIMMs populated but no DIMMs are populated for Processor
1. In this case, Processor 0 will have access to low latency local memory and high memory
bandwidth. However, Processor 1 has access only to remote or “far” memory. So, threads
executing on Processor 1 will have a long latency to access memory as compared to threads on
Processor 0.
This is due to the latency penalty incurred to traverse the QPI links to access the data on the
remote memory controller. The latency to access remote memory is almost 75% higher than local
memory access. The bandwidth to remote memory is also limited by the capability of the QPI
links. So, the goal should be to always populate both processors with memory.
表9
3.0 Best Practices
(最优配置方法)
In this section, we recapture the various rules to be followed for optimal memory configuration on
the Xeon 5500 based platforms.
3.1 Maximum Performance
Follow these rules for peak performance:
• Always populate both processors with equal amounts of memory to ensure a balanced
NUMA system.(两CPU使用相同容量内存)
• Always populate all 3 memory channels on each processor with equal memory capacity.
(每个CPU的3个内存通道使用相同容量的内存)
• Ensure an even number of ranks are populated per channel.
(每个通道占用的合适的RANK数)
• Use dual-rank DIMMs whenever appropriate.
(可以的话使用双RANK的内存)
• For optimal 1333MHz performance, populate 6 dual-rank DIMMs (3 per processor).
• For optimal 1066MHz performance, populate 12 dual-rank DIMMs (6 per processor).
• For optimal 800MHz performance with high DIMM counts:
– On 12 DIMM platforms, populate 12 dual-rank or quad-rank DIMMs (6 ) per processor.
– On 16 DIMM platforms:
Populate 12 dual-rank or quad-rank DIMMs (6 per processor).
Populate 14 dual-rank DIMMs of one size and 2 dual-rank DIMMs of double the size
as described in the interleaving section.
• With the above rules, it is not possible to have a performance-optimized system with 4GB,
8GB, 16GB, or 128GB. With 3 memory channels and interleaving rules, customers need to
configure systems with 6GB, 12GB, 18GB, 24GB, 48GB, 72GB, 96GB, etc., for optimized
performance.
3.2 Other Considerations
3.2.1 Plugging Order
Take care to populate empty DIMM sockets in the specific order for each platform when adding
DIMMs to Xeon 5500 series platforms, The DIMM socket farthest away from its associated
processor, per memory channel, is always plugged first. Consult the documentation with your
specific system for details.
3.2.2 Power Guidelines
This document is focused on maximum performance configuration for Xeon 5500 series
processor-based systems. Here are a few power guidelines for consideration:
• Fewer larger DIMMs (for example 6 x 4GB DIMMs vs. 12 x 2GB DIMMs will generally have
lower power requirements
• x8 DIMMs (x8 data width of rank, see section 3.3) will generally draw less power than
equivalently sized x4 DIMMs
• Consider BIOS configuration settings (see section 4.2.4)
3.2.3 Reliability
Here are two reliability guidelines for consideration:
• Using fewer, larger DIMMs (for example 6 x 4 GB DIMMs vs. 12 x 2GB DIMMs is generally
more reliable
• Xeon 5500 series memory controllers support IBM Chipkill™ memory protection technology
with x4 DIMMs (x4 data width of rank; see sect. 3.3), but not with x8 DIMMs
3.2.4 BIOS Configuration Settings
There are a number of BIOS configuration settings on servers using the Xeon 5500 series
processors that can also affect memory performance or benchmark results. For example, most
platforms allow the option of decreasing the memory clock speed below the supported maximum.
This may be useful for power savings but, obviously, decreases memory performance.
Meanwhile, options like Hyper-Threading Technology (formerly known as Simultaneous Multi-
Threading) and Turbo Boost Technology can also significantly affect benchmark results. Specific
memory configuration settings important to performance include:
表10
原文作者:
Ganesh Balakrishnan
IBM System x and BladeCenter Performance
Ralph M. Begun
IBM System x Development
发表评论
-
sysctl.conf
2011-07-06 14:54 1768fs.file-max=51200 net.core.net ... -
top的替代工具
2011-06-28 15:06 1478dstat -cgilpymn collectl and ... -
有用的小工具
2010-12-23 11:51 1357pv stream nessus Nikto ski ... -
调优linux i/o 行为
2010-11-25 11:27 2928http://www.westnet.com/~gsmith/ ... -
服务器部署工具
2010-11-12 16:32 2076http://www.linuxlinks.com/artic ... -
开源的配置管理工具
2010-11-12 16:24 1479最佳开源配置管理工具: Puppet / 提名:OpenQ ... -
优化ext3的mount选项
2010-11-12 10:24 1367defaults,commit=600,noatime,nod ... -
恢复r710biso 出厂设置
2010-11-10 10:30 1230ALT+E/F/B -
每进程io监控工具
2010-11-02 14:14 1670iodump iotop iopp pidstat b ... -
zabbix短信报警脚本文件
2010-10-21 14:28 2796附件 -
天外飞仙级别的Linux Shell命令
2010-10-16 09:59 1481本文编译自commandlinefu.com ( 应该是 Ca ... -
lenny+r710+lvm 重启问题解决方案
2010-10-15 14:22 1140ro rootdelay=10 quiet -
fai,debian 自动安装工具
2010-10-15 13:36 1129http://sys.firnow.com/linux/x80 ... -
十个服务器监控工具
2010-09-26 11:44 1848一位国外的技术博主在 ... -
restrict authorized_keys
2010-09-06 09:45 1277command="/home/someuser/rs ... -
sysctl优化设置
2010-09-05 11:25 1183sysctl 是一个用来在系统运作中查看及调整系统参数的工 ... -
proc文件系统
2010-09-05 11:22 1292什么是proc文件系统? proc文件系统是一个伪 ... -
nfs使用
2010-09-02 17:01 1165http://www.linuxhomenetworking. ... -
lsof example
2010-08-23 12:40 12881、查看文件系统阻塞 ... -
在centos 5.3上安装nginx0.7.67+mysql5.1.49+php5.2.14
2010-08-21 13:41 4503Nginx (”engine x”) 是一个高性能的 HTTP ...
相关推荐
在本篇文章中,我们将探讨至强(Xeon)服务器CPU,这是由英特尔公司推出的专为服务器、工作站和高端台式机设计的处理器系列。 【至强CPU天梯图详解】 “至强服务器CPU天梯图”是一种直观展示各款CPU性能的图表,...
标题中的"CPU-Z所有CPU识别为XEON至强忽悠专用"可能意味着用户在使用CPU-Z时遇到了一个异常情况:无论哪款CPU被插入或检测,CPU-Z都显示为Intel Xeon(至强)系列。Intel Xeon处理器是Intel公司针对服务器、工作站及...
Intel Xeon系列处理器是Intel面向服务器、数据中心和高性能计算平台推出的高端产品线。在提供的算力表中,我们可以看到各种不同型号的Intel Xeon处理器及其对应的GFLOPS值,这反映了它们在执行浮点运算时的能力。...
标题中的“Intel原装1U机架XEON服务器不足5000元”指的是Intel公司生产的一款1U规格的服务器,搭载Xeon处理器,售价低于5000元。这在当时是一个相对较低的价格,使得这款服务器进入了入门级市场的竞争。 这款服务器...
XEON 系列 CPU 是 intel 服务器 CPU 发展历程中的一个里程碑,XEON 系列 CPU 采用了高速缓存、多路并行计算等技术,提高了服务器 CPU 的计算性能和可靠性。 AMD 服务器 CPU 的发展历程 AMD 是服务器 CPU 市场上的...
该服务器采用Intel 64-bit Xeon 5500/5600系列处理器,这一系列处理器具有先进的计算性能,支持SAS2.0控制器,兼容GE和10G以太网,以及InfiniBand交换,提供了强大的网络连接能力。其电源转换效率高达93%,显著降低...
Intel C610系列芯片组是专为基于Intel Xeon E5-2600 v3/v4系列处理器的服务器设计的高性能平台解决方案。该芯片组支持DDR4内存技术,并提供多个PCI Express 3.0插槽,支持多种高速I/O连接选项。此外,C610系列芯片组...
这是因为服务器主板的设计通常是为了支持特定类型的CPU而设计的,比如Intel Xeon系列。因此,在选择第二颗CPU时,应该根据已有的CPU型号来决定。例如,如果服务器当前使用的是Intel Xeon E5410,那么第二颗CPU也应该...
英特尔和 AMD 是服务器 CPU 领域的两家最大设计生产商,分别拥有 XEON系列和 Opteron 系列。 服务器 CPU 的发展历程可以追溯到 1995 年,英特尔发布了 Pentium Pro 处理器,这是第一款应用于服务器和工作站设计的...
LGA771 是 Intel 推出的服务器/工作站 CPU 插槽,支持 Intel Xeon DP 处理器。该插槽具有 771个针脚,支持双通道 DDR2 内存和 PCIe 1.0 技术。 二、AMD CPU 插槽插座列表 1. Socket FM1:支持 AMD A 系列 APU ...
表1列出了HS22的主要特征,处理器选项包括不同核心数的Intel Xeon 5500和5600系列处理器,标准配置下可选择单处理器,最大支持双处理器。 2. **处理器** 支持的处理器速度取决于具体型号,可选800 MHz、1033 MHz或...
该主板配备了六通道DDR3-1333内存接口,每个Intel Xeon EC5500系列处理器有三个通道,支持ECC注册的PC3-8500或PC3-10600 DDR3 Mini-DIMMs。最大内存容量可达48GB(当使用8GB DDR3 Mini-DIMMs时),或在使用4GB Mini-...
- **处理器**: Gen9服务器通常搭载Intel Xeon E5-2600 v3或v4系列处理器,提供多核心和高频率,支持超线程技术,能够处理大量并发任务。 - **内存**: 支持DDR4 ECC内存,显著提高数据处理速度和稳定性,减少错误。...
华硕RS520-E6基于英特尔的nehalem-EP平台,搭载双路Xeon 5500/5600系列处理器,标准配置为四核E5520处理器,拥有2.26GHz的主频和8MB三级缓存。其强大的计算性能和高达5.86GT/s的QPI总线速度,使得该服务器在处理大型...
本文将对英特尔服务器CPU进行详细的解析,涵盖从Pentium Pro到Pentium II Xeon的发展历程。 1. Pentium Pro CPU Pentium Pro是英特尔于1995年发布的服务器CPU,标志着英特尔在服务器市场的入场。Pentium Pro的内部...
列举的Intel Xeon E5系列处理器中,不同型号的至强处理器会有所差异,部分型号可能不支持Turbo Boost技术,如至强E5-2407,或有较低的主频和缓存,如至强E5-2403。在选择适合的Dell服务器时,需要根据服务器的用途、...
Intel Xeon E5系列处理器专为高性能服务器设计,具备出色的多线程处理能力和高可靠性,非常适合NAS应用场景。 - **Intel Xeon E5-2699v3**:这款处理器拥有18个核心,每个核心的基础频率为2.3GHz,TDP(热设计功率...
【CPU核心与性能】Intel的Xeon系列处理器通常用于服务器和工作站,其性能提升对于数据中心和专业应用领域至关重要。新一代Xeon处理器性能提升18%的宣称,意味着在处理复杂计算任务和多任务并行时,能提供更快的速度...
### 惠普服务器售后工程师培训文档知识点梳理 ...此外,文档还特别强调了Nehalem架构下Intel Xeon 5500系列处理器的优势以及Turbo Boost技术的应用场景,这些都是提高服务器性能和效率的关键因素。
Intel Atom系列CPU手册是针对Intel Atom® Processor C3000产品家族的技术文档,它提供了该处理器系列的硬件接口和寄存器详细说明。本文将深入探讨与Intel Atom® Processor C3000产品家族相关的知识点。 首先,...