`
buliedian
  • 浏览: 1238616 次
  • 性别: Icon_minigender_2
  • 来自: 北京
文章分类
社区版块
存档分类
最新评论

[转]理解Linux CPU 负载-什么时候该担心

阅读更多

Understanding Linux CPU Load - when should you be worried?

You might be familiar with Linux load averages already. Load averages are the three numbers shown with the uptime and top commands - they look like this:

load average: 0.09, 0.05, 0.01

Most people have an inkling of what the load averages mean: the three numbers represent averages over progressively longer periods of time (one, five, and fifteen minute averages), and that lower numbers are better. Higher numbers represent a problem or an overloaded machine. But, what's the the threshold? What constitutes "good" and "bad" load average values? When should you be concerned over a load average value, and when should you scramble to fix it ASAP?

First, a little background on what the load average values mean. We'll start out with the simplest case: a machine with one single-core processor.

The traffic analogy

A single-core CPU is like a single lane of traffic. Imagine you are a bridge operator ... sometimes your bridge is so busy there are cars lined up to cross. You want to let folks know how traffic is moving on your bridge. A decent metric would be how many cars are waiting at a particular time. If no cars are waiting, incoming drivers know they can drive across right away. If cars are backed up, drivers know they're in for delays.

So, Bridge Operator, what numbering system are you going to use? How about:

  • 0.00 means there's no traffic on the bridge at all . In fact, between 0.00 and 1.00 means there's no backup, and an arriving car will just go right on.
  • 1.00 means the bridge is exactly at capacity. All is still good, but if traffic gets a little heavier, things are going to slow down.
  • over 1.00 means there's backup. How much? Well, 2.00 means that there are two lanes worth of cars total -- one lane's worth on the bridge, and one lane's worth waiting. 3.00 means there are three lane's worth total -- one lane's worth on the bridge, and two lanes' worth waiting. Etc.

= load of 1.00

= load of 0.50

= load of 1.70

This is basically what CPU load is. "Cars" are processes using a slice of CPU time ("crossing the bridge") or queued up to use the CPU. Unix refers to this as the run-queue length : the sum of the number of processes that are currently running plus the number that are waiting (queued) to run.

Like the bridge operator, you'd like your cars/processes to never be waiting. So, your CPU load should ideally stay below 1.00. Also like the bridge operator, you are still ok if you get some temporary spikes above 1.00 ... but when you're consistently above 1.00, you need to worry.

So you're saying the ideal load is 1.00?

Well, not exactly. The problem with a load of 1.00 is that you have no headroom. In practice, many sysadmins will draw a line at 0.70:

  • The "Need to Look into it" Rule of Thumb: 0.70 If your load average is staying above > 0.70, it's time to investigate before things get worse.

  • The "Fix this now" Rule of Thumb: 1.00 . If your load average stays above 1.00, find the problem and fix it now. Otherwise, you're going to get woken up in the middle of the night, and it's not going to be fun.

  • The "Arrgh, it's 3AM WTF?" Rule of Thumb: 5.0 . If your load average is above 5.00, you could be in serious trouble, your box is either hanging or slowing way down, and this will (inexplicably) happen in the worst possible time like in the middle of the night or when you're presenting at a conference. Don't let it get there.

What about Multi-processors? My load says 3.00, but things are running fine!

Got a quad-processor system? It's still healthy with a load of 3.00.

On multi-processor system, the load is relative to the number of processor cores available. The "100% utilization" mark is 1.00 on a single-core system, 2.00, on a dual-core, 4.00 on a quad-core, etc.

If we go back to the bridge analogy, the "1.00" really means "one lane's worth of traffic". On a one-lane bridge, that means it's filled up. On a two-late bridge, a load of 1.00 means its at 50% capacity -- only one lane is full, so there's another whole lane that can be filled.

= load of 2.00 on two-lane road

Same with CPUs: a load of 1.00 is 100% CPU utilization on single-core box. On a dual-core box, a load of 2.00 is 100% CPU utilization.

Multicore vs. multiprocessor

While we're on the topic, let's talk about multicore vs. multiprocessor. For performance purposes, is a machine with a single dual-core processor basically equivalent to a machine with two processors with one core each? Yes. Roughly. There are lots of subtleties here concerning amount of cache, frequency of process hand-offs between processors, etc. Despite those finer points, for the purposes of sizing up the CPU load value, the total number of cores is what matters, regardless of how many physical processors those cores are spread across.

Which leads us to a two new Rules of Thumb:

  • The "number of cores = max load" Rule of Thumb: on a multicore system, your load should not exceed the number of cores available.

  • The "cores is cores" Rule of Thumb: How the cores are spread out over CPUs doesn't matter. Two quad-cores == four dual-cores == eight single-cores. It's all eight cores for these purposes.

Bringing It Home

Let's take a look at the load averages output from uptime :

~ $ uptime
23:05 up 14 days, 6:08, 7 users, load averages: 0.65 0.42 0.36

This is on a dual-core CPU, so we've got lots of headroom. I won't even think about it until load gets and stays above 1.7 or so.

Now, what about those three numbers? 0.65 is the average over the last minute, 0.42 is the average over the last five minutes, and 0.36 is the average over the last 15 minutes. Which brings us to the question:

Which average should I be observing? One, five, or 15 minute?

For the numbers we've talked about (1.00 = fix it now, etc), you should be looking at the five or 15-minute averages. Frankly, if your box spikes above 1.0 on the one-minute average, you're still fine. It's when the 15-minute average goes north of 1.0 and stays there that you need to snap to. (obviously, as we've learned, adjust these numbers to the number of processor cores your system has).

So # of cores is important to interpreting load averages ... how do I know how many cores my system has?

cat /proc/cpuinfo to get info on each processor in your system. Note: not available on OSX, Google for alternatives . To get just a count, run it through grep and word count: grep 'model name' /proc/cpuinfo | wc -l

Monitoring Linux CPU Load with Scout

Scout provides 2 ways to modify the CPU load. Our original server load plugin and Jesse Newland's Load-Per-Processor plugin both report the CPU load and alert you when the load peaks and/or is trending in the wrong direction:

load alert

More Reading

分享到:
评论

相关推荐

    Linux-CPU-function-test.zip_cpu_cpu function_linux cpu test_ubun

    本资源"Linux-CPU-function-test.zip"提供了一套针对Linux CPU的测试工具和方法,旨在帮助用户检查CPU的功能完整性和性能。下面将详细阐述相关知识点。 1. **CPU压力测试**:CPU压力测试是为了模拟高负载情况,检查...

    cpufreq-dt.rar_The Target_cpufreq-dt

    在`omap-cpufreq.c`中,"omap"通常是指TI公司的OMAP系列处理器,这是一个广泛应用于嵌入式系统的SoC(System on Chip)。这个驱动程序是专门为OMAP处理器设计的,目的是实现对CPU频率的动态控制,以适应不同的工作...

    linux 排查cpu负载过高异常.docx

    在Linux系统中,CPU负载过高可能会导致系统的性能下降,影响应用程序的正常运行。排查CPU负载过高异常是一项重要的系统维护任务。以下是一套详细的步骤来帮助你定位并解决这个问题。 首先,我们使用`top`命令来找到...

    sigar-linux依赖 libsigar-amd64-linux.so

    "sigar-linux依赖 libsigar-amd64-linux.so" 这个标题表明 Sigar 在Linux系统上运行时需要依赖名为 `libsigar-amd64-linux.so` 的动态链接库文件。这个库是专门为64位(AMD64架构)的Linux系统编译的。 **libsigar-...

    linux-让多核CPU达到指定的CPU使用率脚本

    在Linux系统中,多核CPU的使用率管理是优化系统性能和资源分配的重要环节。"linux-让多核CPU达到指定的CPU使用率脚本"是一个旨在提升机器CPU使用率的实用工具,它能帮助系统管理员更好地控制和利用计算资源。下面...

    linux项目开发-Linux下开发的负载均衡式的OJ项目

    linux项目开发--Linux下开发的负载均衡式的OJ项目

    linux cpu 使用率

    CPU使用率是衡量系统负载及性能的重要指标之一,在Linux环境下,通过监控CPU使用率可以帮助我们更好地理解系统的运行状态,进而进行有效的优化或资源分配。 #### 二、CPU使用率的概念 在深入探讨如何在Linux中计算...

    Linux负载均衡--LVS+Keepalived(终极文档).pdf

    Linux 负载均衡 -- LVS+Keepalived 终极指南 Linux 负载均衡是当前网站业务量增长面临的主要挑战之一。如何实现高性能高可用的负载均衡方案,降低成本,提高网站的可靠性和灵活性,是每个网站管理员和架构师所面临...

    Linux CPU满负载压力测试

    本文将详细探讨如何使用提供的工具进行Linux CPU满负载压力测试,以及如何监控系统资源和CPU温度。 首先,"Linux CPU满负载压力测试"旨在评估Linux系统在高负荷条件下的性能。这种测试有助于识别系统瓶颈,优化硬件...

    Linux内核《CPU负载计算》

    在Linux操作系统中,CPU负载是一个关键的性能指标,用于评估系统的繁忙程度。负载计算涉及到内核调度器的工作,它是理解和优化系统性能的重要环节。本文将深入探讨CPU负载的计算方法,以及它在Linux内核中的实现。 ...

    理解Linux内核2.6.8.1cpu调度

    ### 理解 Linux 内核 2.6.8.1 CPU 调度 #### 引言 本文档由原 SGI 工程师 Josh Aas 编写,旨在深入浅出地介绍 Linux 2.6.8.1 版本内核中的 CPU 调度机制。该文档提供了关于 Linux 内核源代码、进程和线程的基本...

    loadrunner11-load-generator-linux负载机

    《Linux负载机与LoadRunner11中的Load Generator详解》 在软件测试领域,尤其是性能测试中,LoadRunner是一款被广泛使用的工具,它能够模拟大量用户并发执行操作,以测试系统在高负载下的性能表现。本篇文章将重点...

    auto-cpufreq-1.2.zip

    《Linux系统下的CPU频率自动调节:auto-cpufreq-1.2详解》 在现代计算机系统中,CPU作为核心计算部件,其性能和功耗管理是优化系统运行效率的关键因素。"auto-cpufreq-1.2.zip"是一个专为Linux系统设计的CPU频率...

    CPU负载监控脚本代码.docx

    6. 服务器性能监控:该脚本的主要功能是监控服务器的CPU负载,包括获取当前系统的平均负载值、单个核心的平均负载值和警戒值等信息。这些信息可以帮助系统管理员更好地监控和管理服务器的性能。 7. IP地址和网络...

    mysql CPU高负载问题排查

     在某个新服务器上,新建了一个MySQL的实例,该服务器上面只有MySQL这一个进程,但是CPU的负载却居高不下,使用top命令查询的结果如下: [dba_mysql@dba-mysql ~]$ top top - 17:12:44 up 104 days, 20 min, 2 ...

    Linux性能分析之CPU实战-视频课程资源网盘链接提取码下载 .txt

    ### Linux性能分析之CPU实战知识点概述 #### 一、课程背景与目标 - **背景**:随着技术的发展,Linux作为一款强大的开源操作系统,在服务器、云计算等领域占据着举足轻重的地位。然而,随着应用场景的复杂化,Linux...

    Linux下cpu负荷测试代码

    测试cpu负荷性能的原理主要是通过求PI的小数点后面的位数来分析的

    CPU负载测试工具 电脑维修测试用

    总的来说,CPU负载测试工具是电脑维护和故障排查中的有力助手,它们帮助我们理解系统的性能极限,优化配置,以及预防潜在的问题。ORTHOS_CN作为这样的工具,对于想要深入了解和优化自己电脑性能的用户来说,无疑是一...

Global site tag (gtag.js) - Google Analytics