- 浏览: 619957 次
- 性别:
- 来自: 北京
博客专栏
-
白手起家学习使用Flex
浏览量:18706
最新评论
-
softor:
palytoxin 写道如果cherry-pick某分支a某一 ...
git cherry-pick. 如何把已经提交的commit, 从一个分支放到另一个分支 -
softor:
bit6211 写道请问有办法在对两个分支,同时都拥有的文件, ...
git cherry-pick. 如何把已经提交的commit, 从一个分支放到另一个分支 -
bit6211:
请问有办法在对两个分支,同时都拥有的文件,进行修改后,再同时c ...
git cherry-pick. 如何把已经提交的commit, 从一个分支放到另一个分支 -
palytoxin:
如果cherry-pick某分支a某一次提交之后过一段时间对分 ...
git cherry-pick. 如何把已经提交的commit, 从一个分支放到另一个分支 -
tbdp:
jkbjxy 写道楼主了解Lily吗?相比SolrCloud和 ...
solr 与 elasticsearch的比较 (翻译)
近期服务器性能极不稳定,打开首页速度时快时慢(目前只有一个用户,属于企业内部应用,重点是数据量非常大, 1K~ 100K 之间)
下面是从我的邮件摘录出一部分内容:
1. 访问速度不稳定。时快时慢。
2. 仔细看了服务器的日志,没有发现任何出错的问题
没有看到任何 FATAL, WARN 这样级别的日志。
看起来都很正常。
就是服务器的响应时间时快时慢。
我查询了其他日志(delayed_job ,rake )也没有看到相关的。
我准备看一下 /var/log里面的文件,看能不能找到线索
3. 初步找到问题所在
经过对/var/log 目录的排查,我发现了3个相关的程序:
1. tomcat 的crontab。 它每分钟都运行一次。
May 24 21:56:55 configfun crond[30679]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/nul
May 24 21:56:55 configfun crond[30658]: (tomcat) error: Job execution of per-minute job scheduled for 19:18 de
May 24 21:56:55 configfun crond[30658]: CRON (tomcat) ERROR: cannot set security context
May 24 21:57:01 configfun crond[30724]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/nul
May 24 21:58:01 configfun crond[30815]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/nul
May 24 21:59:03 configfun crond[30875]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null
具体命令:
(tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/null)
大概看了一下,不会PERL, 但是发现应该是用来备份TOMCAT的日志。。。
每分钟运行一次。。。
1. 不过系统中没有tomcat 或者jetty在运行啊 ( ps -ef | grep tomcat ,没结果)
2. 估计它对性能影响不大。
附件 ( move_tomcat_log.pl ) 就是这个脚本。 同时,相关的日志是: /var/log/cron (见附件:cron, 877k)
2. /usr/lib64/nagios/plugins/contrib/check_logfiles
通过查看 /var/log/secure 发现的,如下:
May 24 18:47:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:49:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:52:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:54:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:57:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:59:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:02:55 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:04:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:07:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:09:55 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:13:03 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:15:00 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 21:56:46 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 21:57:54 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 21:57:54 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
这个程序的运行时间不固定,但是也非常频繁。我对它不了解,它的官方站点可能是: http://labs.consol.de/lang/en/nagios/check_logfiles/
相关的附件: secure 692k
3. 由于 系统内存太少, apache 进程被关掉了。
这个我觉得可能是 9k 秒才打开首页的原因。见附件: messages 216k
May 24 21:57:47 configfun kernel: DMA: 15*4kB 11*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 1*2048k
May 24 21:57:47 configfun kernel: DMA32: empty
May 24 21:57:47 configfun kernel: Normal: empty
May 24 21:57:47 configfun kernel: HighMem: empty
May 24 21:57:47 configfun kernel: 1340 pagecache pages
May 24 21:57:47 configfun kernel: Swap cache: add 18501044, delete 18500782, find 19025765/21574350, race 2+74
May 24 21:57:47 configfun kernel: Free swap = 0kB
May 24 21:57:47 configfun kernel: Total swap = 1052248kB
May 24 21:57:47 configfun kernel: Free swap: 0kB
May 24 21:57:47 configfun kernel: 264192 pages of RAM
May 24 21:57:47 configfun kernel: 8353 reserved pages
May 24 21:57:47 configfun kernel: 12936 pages shared
May 24 21:57:47 configfun kernel: 262 pages swap cached
May 24 21:57:47 configfun kernel: Out of memory: Killed process 4553, UID 48, (httpd).
可以看出,进程被删掉的时间,跟 passenger的日志很吻合( 21:57:xx)
另外,今天下午,我写了一个小脚本,每隔1分20秒,就抓取一次babble首页, 响应的速度很好,服务器端的日志都在1秒以内。
07:23:39 INFO: Completed 200 OK in 878ms (Views: 596.6ms | ActiveRecord: 107.6ms)
07:25:03 INFO: Completed 200 OK in 837ms (Views: 595.4ms | ActiveRecord: 90.2ms)
07:26:27 INFO: Completed 200 OK in 844ms (Views: 671.8ms | ActiveRecord: 105.1ms)
07:27:51 INFO: Completed 200 OK in 823ms (Views: 547.6ms | ActiveRecord: 104.4ms)
07:29:15 INFO: Completed 200 OK in 802ms (Views: 539.5ms | ActiveRecord: 94.2ms)
07:30:39 INFO: Completed 200 OK in 872ms (Views: 596.1ms | ActiveRecord: 101.2ms)
07:32:04 INFO: Completed 200 OK in 933ms (Views: 664.1ms | ActiveRecord: 99.6ms)
07:33:28 INFO: Completed 200 OK in 854ms (Views: 679.0ms | ActiveRecord: 103.8ms)
07:34:52 INFO: Completed 200 OK in 799ms (Views: 639.7ms | ActiveRecord: 89.4ms)
07:36:16 INFO: Completed 200 OK in 817ms (Views: 552.5ms | ActiveRecord: 88.8ms)
07:37:40 INFO: Completed 200 OK in 857ms (Views: 577.7ms | ActiveRecord: 101.2ms)
07:39:04 INFO: Completed 200 OK in 878ms (Views: 595.0ms | ActiveRecord: 100.6ms)
07:40:29 INFO: Completed 200 OK in 937ms (Views: 663.3ms | ActiveRecord: 97.4ms)
07:41:53 INFO: Completed 200 OK in 788ms (Views: 626.4ms | ActiveRecord: 93.4ms)
07:43:17 INFO: Completed 200 OK in 844ms (Views: 564.9ms | ActiveRecord: 94.4ms)
07:44:28 INFO: Completed 200 OK in 860ms (Views: 583.1ms | ActiveRecord: 109.1ms)
07:44:42 INFO: Completed 200 OK in 789ms (Views: 534.1ms | ActiveRecord: 93.6ms)
07:46:06 INFO: Completed 200 OK in 969ms (Views: 696.8ms | ActiveRecord: 95.7ms)
07:47:31 INFO: Completed 200 OK in 839ms (Views: 674.5ms | ActiveRecord: 95.3ms)
07:48:56 INFO: Completed 200 OK in 858ms (Views: 691.3ms | ActiveRecord: 98.6ms)
07:50:20 INFO: Completed 200 OK in 847ms (Views: 581.7ms | ActiveRecord: 94.6ms)
07:51:45 INFO: Completed 200 OK in 835ms (Views: 561.5ms | ActiveRecord: 96.5ms)
07:53:09 INFO: Completed 200 OK in 865ms (Views: 603.2ms | ActiveRecord: 105.3ms)
07:54:33 INFO: Completed 200 OK in 953ms (Views: 687.8ms | ActiveRecord: 101.7ms)
07:55:58 INFO: Completed 200 OK in 892ms (Views: 720.2ms | ActiveRecord: 103.2ms)
07:57:22 INFO: Completed 200 OK in 781ms (Views: 523.7ms | ActiveRecord: 87.9ms)
07:58:46 INFO: Completed 200 OK in 857ms (Views: 596.3ms | ActiveRecord: 102.2ms)
08:00:11 INFO: Completed 200 OK in 890ms (Views: 596.2ms | ActiveRecord: 113.1ms)
08:01:36 INFO: Completed 200 OK in 859ms (Views: 589.9ms | ActiveRecord: 98.9ms)
所以,我觉得是不是configfun的内存太小 ( 1 G 内存) ,运行的程序太多,导致的它性能不稳定?
能否申请给它增加 内存?
4. 不过,在增加内存(从硬件着手解决问题)之前,还是要分析一下,看看这1G的现有内存是如何被程序占用的。
根据 http://stackoverflow.com/questions/131303/linux-how-to-measure-actual-memory-usage-of-an-application-or-process ,先使用 $ ps aux
重点查看 VSZ 和 RSS
RSS: Resident set size, the non-swapped physical memory that a task has used (in kiloBytes). 物理内存
VSZ: virtual memory usage of entire process. vm_lib + vm_exe + vm_data + vm_stack 使用的总内存吧?
$ ps aux --sort -rss,-vsz (针对 RSS, VSZ 倒序排序, 参考: http://www.devdaily.com/linux/unix-linux-process-memory-sort-ps-command-cpu)
结果:
5. 开始优化
5.1 15152 这个PID 排在第一位,不知道它是干啥的。总之看起来跟我的RAILS启动方式无关(passenger + apache ), 路径确实是目前部署的应用的所在路径,所以直接把它停止。
$ kill -9 15152
之后, 人肉测试一下,没有影响到当前的应用。
top:
top - 02:24:07 up 40 days, 9:58, 3 users, load average: 0.07, 0.05, 0.01
Tasks: 121 total, 2 running, 119 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1048576k total, 979032k used, 69544k free, 8008k buffers
Swap: 1052248k total, 363172k used, 689076k free, 134872k cached
不过。。。发现它又上来了。。。
ps aux --sort -rss,-vsz
[kcv478@configfun ~]$ ps aux --sort -rss,-vsz
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kcv478 19037 1.0 9.5 186196 100076 ? S 02:23 0:03 Rails: /rails_apps/babble_portal/releases/manuall
kcv478 19034 1.3 6.4 152392 67504 ? S 02:23 0:04 Passenger ApplicationSpawner: /rails_apps/babble_
kcv478 10880 0.0 6.1 148424 64632 ? S 00:01 0:00 delayed_job
kcv478 4607 0.0 6.0 148412 63196 ? S May10 7:03 delayed_job
kcv478 25447 0.0 6.0 148292 63172 ? S May07 7:06 delayed_job
kcv478 20865 0.0 6.0 148260 63168 ? S May07 7:00 delayed_job
kcv478 10025 0.0 6.0 148432 63160 ? S May09 7:10 delayed_job
kcv478 31266 0.0 6.0 148412 63160 ? S May11 6:59 delayed_job
kcv478 21174 0.0 6.0 148372 63124 ? S May04 7:10 delayed_job
kcv478 8578 0.0 6.0 148340 63080 ? S Apr28 7:43 delayed_job
kcv478 15089 0.0 5.9 141180 62220 ? S Apr27 7:46 delayed_job
root 10287 0.0 0.8 38072 8868 ? S May27 0:00 Passenger spawn server
看来它是正常工作的进程。嘿嘿。。。
继续。。。
5.2 9 个delayed_job 进程
我每次部署的时候,都忘记停止当前应用的进程,而是直接把当前应用删掉,然后在开启一个新进程,会不会现在的问题跟它有关?
我先看看目前的DELAYED——JOB的正常的 PID:
(15152, 10880) 一共是两个进程,(其中一个15152 已经被我提前删掉了)
把它们都kill 之后,
非常好,系统物理内存变成了: 859788k used, 188788k free,
再看看 ps aux ,不错,变成了8个delayed_job了。它们都是没用的。
[kcv478@configfun ~]$ ps aux --sort -rss,-vsz
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kcv478 19037 0.5 9.5 186196 100076 ? S 02:23 0:03 Rails: /rails_apps/babble_portal/releases/manuall
kcv478 4607 0.0 6.0 148412 63196 ? S May10 7:03 delayed_job
kcv478 25447 0.0 6.0 148292 63172 ? S May07 7:06 delayed_job
kcv478 20865 0.0 6.0 148260 63168 ? S May07 7:00 delayed_job
kcv478 10025 0.0 6.0 148432 63160 ? S May09 7:10 delayed_job
kcv478 21174 0.0 6.0 148372 63124 ? S May04 7:10 delayed_job
kcv478 31266 0.0 6.0 148288 63124 ? S May11 6:59 delayed_job
kcv478 8578 0.0 6.0 148340 63080 ? S Apr28 7:43 delayed_job
kcv478 15089 0.0 5.9 141180 62220 ? S Apr27 7:46 delayed_job
root 10287 0.0 0.8 38064 8900 ? S May27 0:00 Passenger spawn server
5.3 把剩下8个没用的 delayed_job 进程停止掉。
现在 top 一下,帅呆了。。。释放了 680 MB内存。。。
看看 ps 的结果:
很好,现在运行的进程, RSS基本都在 10MB以下。 先这样吧 :)
总结:
本以为会是一个很麻烦的过程(卸载不必要的程序啊,调试啊,等等)现在看起来,原因就在于没有及时的把原来跑着的delayed_job停止。 把它们删掉之后就问题解决了。。:-)。如此简单。
下面是从我的邮件摘录出一部分内容:
1. 访问速度不稳定。时快时慢。
引用
下面是 今天的 (2012-05-25) 的日志: 可以看到, 前两个操作,耗费的时间非常久。。。
00:50:33 INFO: Completed 200 OK in 270450ms (Views: 1512.9ms | ActiveRecord: 13358.6ms)
01:42:15 INFO: Completed 200 OK in 246410ms (Views: 1429.3ms | ActiveRecord: 10685.1ms)
01:43:35 INFO: Completed 200 OK in 72ms (Views: 71.3ms | ActiveRecord: 0.0ms)
01:43:42 INFO: Completed 200 OK in 1852ms (Views: 1326.3ms | ActiveRecord: 165.6ms)
01:53:41 INFO: Completed 200 OK in 3749ms (Views: 1334.6ms | ActiveRecord: 2063.1ms)
01:54:23 INFO: Completed 200 OK in 13ms (Views: 7.4ms | ActiveRecord: 0.0ms)
01:54:25 INFO: Completed 200 OK in 284ms (Views: 43.3ms | ActiveRecord: 225.3ms)
01:54:31 INFO: Completed 200 OK in 246ms (Views: 42.4ms | ActiveRecord: 200.2ms)
01:54:31 INFO: Completed 200 OK in 477ms (Views: 313.1ms | ActiveRecord: 160.1ms)
另外从昨天的日志(05-24)进行一下分析,可以看到 ip : 172.21.100.1 打开首页,最慢的用了 9,704秒。(2个多小时,详细见附件: log_0524 ) (使用命令:cat log_from_configfun/manually_2012-05-24.log | grep -E 'Completed|GET|POST' > log_0524 )
867 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 19:11:54 +0000 2012
868 19:12:13 INFO: Completed 200 OK in 39137ms (Views: 5839.8ms | ActiveRecord: 4412.1ms)
869 Started GET "/babble_portal/device_groups" for 172.21.100.1 at Thu May 24 19:12:25 +0000 2012
870 19:13:45 INFO: Completed 200 OK in 93696ms (Views: 38701.7ms | ActiveRecord: 40828.7ms)
871 Started GET "/babble_portal/device_groups" for 172.21.100.1 at Thu May 24 19:14:42 +0000 2012
872 21:57:41 INFO: Completed 200 OK in 9704339ms (Views: 1176.9ms | ActiveRecord: 76436.0ms)
873 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:57:43 +0000 2012
874 21:58:20 INFO: Completed 200 OK in 9954427ms (Views: 174.3ms | ActiveRecord: 68178.3ms)
875 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:58:21 +0000 2012
876 21:58:21 INFO: Completed 200 OK in 38122ms (Views: 1435.4ms | ActiveRecord: 35950.5ms)
877 21:58:45 INFO: Completed 200 OK in 23981ms (Views: 2407.9ms | ActiveRecord: 20904.1ms)
而对于 打开首页的速度,我也做了一个分析:( 见附件: landing_page_log, 使用的命令是: cat log_from_configfun/manually_2012-05-24.log | grep -E 'Completed|GET.*portal\/"' | grep GET -A 1 > landing_page_log ) 可以看出,绿色的部分表示程序运行正常,而红色的部分表示程序运行极慢。。。
49 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:46:38 +0000 2012
50 18:46:39 INFO: Completed 200 OK in 975ms (Views: 677.1ms | ActiveRecord: 105.7ms)
51 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:47:04 +0000 2012
52 18:47:05 INFO: Completed 200 OK in 981ms (Views: 685.0ms | ActiveRecord: 98.0ms)
53 --
54 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:48:45 +0000 2012
55 18:48:47 INFO: Completed 200 OK in 1109ms (Views: 670.0ms | ActiveRecord: 89.7ms)
56 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:48:51 +0000 2012
57 18:48:51 INFO: Completed 200 OK in 944ms (Views: 656.4ms | ActiveRecord: 99.7ms)
58 --
59 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:54:40 +0000 2012
60 18:54:42 INFO: Completed 200 OK in 1847ms (Views: 1647.4ms | ActiveRecord: 116.9ms)
61 --
62 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:55:10 +0000 2012
63 18:55:12 INFO: Completed 200 OK in 1330ms (Views: 1142.7ms | ActiveRecord: 102.4ms)
64 --
65 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:56:34 +0000 2012
66 18:56:35 INFO: Completed 200 OK in 1095ms (Views: 779.4ms | ActiveRecord: 100.7ms)
67 --
68 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 19:11:54 +0000 2012
69 19:12:13 INFO: Completed 200 OK in 39137ms (Views: 5839.8ms | ActiveRecord: 4412.1ms)
70 --
71 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:57:43 +0000 2012
72 21:58:20 INFO: Completed 200 OK in 9954427ms (Views: 174.3ms | ActiveRecord: 68178.3ms)
73 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:58:21 +0000 2012
74 21:58:21 INFO: Completed 200 OK in 38122ms (Views: 1435.4ms | ActiveRecord: 35950.5ms)
所以我现在考虑的方向是,是不是configfun 服务器当时 ( 21:58分, 19:12分)正在忙于运行其他程序?
00:50:33 INFO: Completed 200 OK in 270450ms (Views: 1512.9ms | ActiveRecord: 13358.6ms)
01:42:15 INFO: Completed 200 OK in 246410ms (Views: 1429.3ms | ActiveRecord: 10685.1ms)
01:43:35 INFO: Completed 200 OK in 72ms (Views: 71.3ms | ActiveRecord: 0.0ms)
01:43:42 INFO: Completed 200 OK in 1852ms (Views: 1326.3ms | ActiveRecord: 165.6ms)
01:53:41 INFO: Completed 200 OK in 3749ms (Views: 1334.6ms | ActiveRecord: 2063.1ms)
01:54:23 INFO: Completed 200 OK in 13ms (Views: 7.4ms | ActiveRecord: 0.0ms)
01:54:25 INFO: Completed 200 OK in 284ms (Views: 43.3ms | ActiveRecord: 225.3ms)
01:54:31 INFO: Completed 200 OK in 246ms (Views: 42.4ms | ActiveRecord: 200.2ms)
01:54:31 INFO: Completed 200 OK in 477ms (Views: 313.1ms | ActiveRecord: 160.1ms)
另外从昨天的日志(05-24)进行一下分析,可以看到 ip : 172.21.100.1 打开首页,最慢的用了 9,704秒。(2个多小时,详细见附件: log_0524 ) (使用命令:cat log_from_configfun/manually_2012-05-24.log | grep -E 'Completed|GET|POST' > log_0524 )
867 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 19:11:54 +0000 2012
868 19:12:13 INFO: Completed 200 OK in 39137ms (Views: 5839.8ms | ActiveRecord: 4412.1ms)
869 Started GET "/babble_portal/device_groups" for 172.21.100.1 at Thu May 24 19:12:25 +0000 2012
870 19:13:45 INFO: Completed 200 OK in 93696ms (Views: 38701.7ms | ActiveRecord: 40828.7ms)
871 Started GET "/babble_portal/device_groups" for 172.21.100.1 at Thu May 24 19:14:42 +0000 2012
872 21:57:41 INFO: Completed 200 OK in 9704339ms (Views: 1176.9ms | ActiveRecord: 76436.0ms)
873 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:57:43 +0000 2012
874 21:58:20 INFO: Completed 200 OK in 9954427ms (Views: 174.3ms | ActiveRecord: 68178.3ms)
875 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:58:21 +0000 2012
876 21:58:21 INFO: Completed 200 OK in 38122ms (Views: 1435.4ms | ActiveRecord: 35950.5ms)
877 21:58:45 INFO: Completed 200 OK in 23981ms (Views: 2407.9ms | ActiveRecord: 20904.1ms)
而对于 打开首页的速度,我也做了一个分析:( 见附件: landing_page_log, 使用的命令是: cat log_from_configfun/manually_2012-05-24.log | grep -E 'Completed|GET.*portal\/"' | grep GET -A 1 > landing_page_log ) 可以看出,绿色的部分表示程序运行正常,而红色的部分表示程序运行极慢。。。
49 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:46:38 +0000 2012
50 18:46:39 INFO: Completed 200 OK in 975ms (Views: 677.1ms | ActiveRecord: 105.7ms)
51 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:47:04 +0000 2012
52 18:47:05 INFO: Completed 200 OK in 981ms (Views: 685.0ms | ActiveRecord: 98.0ms)
53 --
54 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:48:45 +0000 2012
55 18:48:47 INFO: Completed 200 OK in 1109ms (Views: 670.0ms | ActiveRecord: 89.7ms)
56 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:48:51 +0000 2012
57 18:48:51 INFO: Completed 200 OK in 944ms (Views: 656.4ms | ActiveRecord: 99.7ms)
58 --
59 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:54:40 +0000 2012
60 18:54:42 INFO: Completed 200 OK in 1847ms (Views: 1647.4ms | ActiveRecord: 116.9ms)
61 --
62 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:55:10 +0000 2012
63 18:55:12 INFO: Completed 200 OK in 1330ms (Views: 1142.7ms | ActiveRecord: 102.4ms)
64 --
65 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 18:56:34 +0000 2012
66 18:56:35 INFO: Completed 200 OK in 1095ms (Views: 779.4ms | ActiveRecord: 100.7ms)
67 --
68 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 19:11:54 +0000 2012
69 19:12:13 INFO: Completed 200 OK in 39137ms (Views: 5839.8ms | ActiveRecord: 4412.1ms)
70 --
71 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:57:43 +0000 2012
72 21:58:20 INFO: Completed 200 OK in 9954427ms (Views: 174.3ms | ActiveRecord: 68178.3ms)
73 Started GET "/babble_portal/" for 172.21.100.1 at Thu May 24 21:58:21 +0000 2012
74 21:58:21 INFO: Completed 200 OK in 38122ms (Views: 1435.4ms | ActiveRecord: 35950.5ms)
所以我现在考虑的方向是,是不是configfun 服务器当时 ( 21:58分, 19:12分)正在忙于运行其他程序?
2. 仔细看了服务器的日志,没有发现任何出错的问题
引用
没有看到任何 FATAL, WARN 这样级别的日志。
看起来都很正常。
就是服务器的响应时间时快时慢。
我查询了其他日志(delayed_job ,rake )也没有看到相关的。
我准备看一下 /var/log里面的文件,看能不能找到线索
3. 初步找到问题所在
引用
经过对/var/log 目录的排查,我发现了3个相关的程序:
1. tomcat 的crontab。 它每分钟都运行一次。
May 24 21:56:55 configfun crond[30679]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/nul
May 24 21:56:55 configfun crond[30658]: (tomcat) error: Job execution of per-minute job scheduled for 19:18 de
May 24 21:56:55 configfun crond[30658]: CRON (tomcat) ERROR: cannot set security context
May 24 21:57:01 configfun crond[30724]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/nul
May 24 21:58:01 configfun crond[30815]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/nul
May 24 21:59:03 configfun crond[30875]: (tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null
具体命令:
(tomcat) CMD (/var/blur/bin/move_tomcat_log.pl > /dev/null 2>>/dev/null)
大概看了一下,不会PERL, 但是发现应该是用来备份TOMCAT的日志。。。
每分钟运行一次。。。
1. 不过系统中没有tomcat 或者jetty在运行啊 ( ps -ef | grep tomcat ,没结果)
2. 估计它对性能影响不大。
附件 ( move_tomcat_log.pl ) 就是这个脚本。 同时,相关的日志是: /var/log/cron (见附件:cron, 877k)
2. /usr/lib64/nagios/plugins/contrib/check_logfiles
通过查看 /var/log/secure 发现的,如下:
May 24 18:47:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:49:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:52:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:54:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:57:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 18:59:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:02:55 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:04:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:07:53 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:09:55 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:13:03 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 19:15:00 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 21:56:46 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 21:57:54 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
May 24 21:57:54 configfun sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/lib64/nagios/plugins/con
这个程序的运行时间不固定,但是也非常频繁。我对它不了解,它的官方站点可能是: http://labs.consol.de/lang/en/nagios/check_logfiles/
相关的附件: secure 692k
3. 由于 系统内存太少, apache 进程被关掉了。
这个我觉得可能是 9k 秒才打开首页的原因。见附件: messages 216k
May 24 21:57:47 configfun kernel: DMA: 15*4kB 11*8kB 3*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 1*2048k
May 24 21:57:47 configfun kernel: DMA32: empty
May 24 21:57:47 configfun kernel: Normal: empty
May 24 21:57:47 configfun kernel: HighMem: empty
May 24 21:57:47 configfun kernel: 1340 pagecache pages
May 24 21:57:47 configfun kernel: Swap cache: add 18501044, delete 18500782, find 19025765/21574350, race 2+74
May 24 21:57:47 configfun kernel: Free swap = 0kB
May 24 21:57:47 configfun kernel: Total swap = 1052248kB
May 24 21:57:47 configfun kernel: Free swap: 0kB
May 24 21:57:47 configfun kernel: 264192 pages of RAM
May 24 21:57:47 configfun kernel: 8353 reserved pages
May 24 21:57:47 configfun kernel: 12936 pages shared
May 24 21:57:47 configfun kernel: 262 pages swap cached
May 24 21:57:47 configfun kernel: Out of memory: Killed process 4553, UID 48, (httpd).
可以看出,进程被删掉的时间,跟 passenger的日志很吻合( 21:57:xx)
另外,今天下午,我写了一个小脚本,每隔1分20秒,就抓取一次babble首页, 响应的速度很好,服务器端的日志都在1秒以内。
07:23:39 INFO: Completed 200 OK in 878ms (Views: 596.6ms | ActiveRecord: 107.6ms)
07:25:03 INFO: Completed 200 OK in 837ms (Views: 595.4ms | ActiveRecord: 90.2ms)
07:26:27 INFO: Completed 200 OK in 844ms (Views: 671.8ms | ActiveRecord: 105.1ms)
07:27:51 INFO: Completed 200 OK in 823ms (Views: 547.6ms | ActiveRecord: 104.4ms)
07:29:15 INFO: Completed 200 OK in 802ms (Views: 539.5ms | ActiveRecord: 94.2ms)
07:30:39 INFO: Completed 200 OK in 872ms (Views: 596.1ms | ActiveRecord: 101.2ms)
07:32:04 INFO: Completed 200 OK in 933ms (Views: 664.1ms | ActiveRecord: 99.6ms)
07:33:28 INFO: Completed 200 OK in 854ms (Views: 679.0ms | ActiveRecord: 103.8ms)
07:34:52 INFO: Completed 200 OK in 799ms (Views: 639.7ms | ActiveRecord: 89.4ms)
07:36:16 INFO: Completed 200 OK in 817ms (Views: 552.5ms | ActiveRecord: 88.8ms)
07:37:40 INFO: Completed 200 OK in 857ms (Views: 577.7ms | ActiveRecord: 101.2ms)
07:39:04 INFO: Completed 200 OK in 878ms (Views: 595.0ms | ActiveRecord: 100.6ms)
07:40:29 INFO: Completed 200 OK in 937ms (Views: 663.3ms | ActiveRecord: 97.4ms)
07:41:53 INFO: Completed 200 OK in 788ms (Views: 626.4ms | ActiveRecord: 93.4ms)
07:43:17 INFO: Completed 200 OK in 844ms (Views: 564.9ms | ActiveRecord: 94.4ms)
07:44:28 INFO: Completed 200 OK in 860ms (Views: 583.1ms | ActiveRecord: 109.1ms)
07:44:42 INFO: Completed 200 OK in 789ms (Views: 534.1ms | ActiveRecord: 93.6ms)
07:46:06 INFO: Completed 200 OK in 969ms (Views: 696.8ms | ActiveRecord: 95.7ms)
07:47:31 INFO: Completed 200 OK in 839ms (Views: 674.5ms | ActiveRecord: 95.3ms)
07:48:56 INFO: Completed 200 OK in 858ms (Views: 691.3ms | ActiveRecord: 98.6ms)
07:50:20 INFO: Completed 200 OK in 847ms (Views: 581.7ms | ActiveRecord: 94.6ms)
07:51:45 INFO: Completed 200 OK in 835ms (Views: 561.5ms | ActiveRecord: 96.5ms)
07:53:09 INFO: Completed 200 OK in 865ms (Views: 603.2ms | ActiveRecord: 105.3ms)
07:54:33 INFO: Completed 200 OK in 953ms (Views: 687.8ms | ActiveRecord: 101.7ms)
07:55:58 INFO: Completed 200 OK in 892ms (Views: 720.2ms | ActiveRecord: 103.2ms)
07:57:22 INFO: Completed 200 OK in 781ms (Views: 523.7ms | ActiveRecord: 87.9ms)
07:58:46 INFO: Completed 200 OK in 857ms (Views: 596.3ms | ActiveRecord: 102.2ms)
08:00:11 INFO: Completed 200 OK in 890ms (Views: 596.2ms | ActiveRecord: 113.1ms)
08:01:36 INFO: Completed 200 OK in 859ms (Views: 589.9ms | ActiveRecord: 98.9ms)
所以,我觉得是不是configfun的内存太小 ( 1 G 内存) ,运行的程序太多,导致的它性能不稳定?
能否申请给它增加 内存?
4. 不过,在增加内存(从硬件着手解决问题)之前,还是要分析一下,看看这1G的现有内存是如何被程序占用的。
根据 http://stackoverflow.com/questions/131303/linux-how-to-measure-actual-memory-usage-of-an-application-or-process ,先使用 $ ps aux
重点查看 VSZ 和 RSS
引用
RSS: Resident set size, the non-swapped physical memory that a task has used (in kiloBytes). 物理内存
VSZ: virtual memory usage of entire process. vm_lib + vm_exe + vm_data + vm_stack 使用的总内存吧?
$ ps aux --sort -rss,-vsz (针对 RSS, VSZ 倒序排序, 参考: http://www.devdaily.com/linux/unix-linux-process-memory-sort-ps-command-cpu)
结果:
引用
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kcv478 15152 0.0 9.3 183640 97556 ? S 01:23 0:01 Rails: /rails_apps/babble_portal/releases/manuall
kcv478 10880 0.0 6.1 148424 64632 ? S 00:01 0:00 delayed_job
kcv478 4607 0.0 6.0 148412 63196 ? S May10 7:03 delayed_job
kcv478 20865 0.0 6.0 148260 63168 ? S May07 7:00 delayed_job
kcv478 10025 0.0 6.0 148432 63160 ? S May09 7:10 delayed_job
kcv478 31266 0.0 6.0 148412 63160 ? S May11 6:59 delayed_job
kcv478 21174 0.0 6.0 148372 63124 ? S May04 7:10 delayed_job
kcv478 8578 0.0 6.0 148340 63080 ? S Apr28 7:43 delayed_job
kcv478 25447 0.0 5.9 147764 62648 ? S May07 7:05 delayed_job
kcv478 15089 0.0 5.9 141180 62220 ? S Apr27 7:46 delayed_job
root 10287 0.0 0.8 38064 8820 ? S May27 0:00 Passenger spawn server
ntp 1705 0.0 0.5 19620 5316 ? SLs Apr17 3:36 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root 1924 0.0 0.4 256936 4708 ? SN Apr17 0:28 /usr/bin/python -tt /usr/sbin/yum-updatesd
root 14197 0.0 0.3 88596 3236 ? Ss 01:06 0:00 sshd: kcv478 [priv]
root 14862 0.0 0.3 88596 3236 ? Ss 01:18 0:00 sshd: kcv478 [priv]
root 16184 0.0 0.3 88596 3236 ? Ss 01:39 0:00 sshd: kcv478 [priv]
apache 10297 0.0 0.2 210820 2596 ? S May27 0:00 /usr/sbin/httpd
mysql 1768 0.0 0.2 306612 2140 ? Sl Apr17 12:16 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/m
apache 10301 0.0 0.2 210820 2128 ? S May27 0:00 /usr/sbin/httpd
apache 10299 0.0 0.1 210820 2092 ? S May27 0:00 /usr/sbin/httpd
apache 10300 0.0 0.1 210820 2092 ? S May27 0:00 /usr/sbin/httpd
apache 10302 0.0 0.1 210820 2084 ? S May27 0:00 /usr/sbin/httpd
apache 10304 0.0 0.1 210820 2072 ? S May27 0:00 /usr/sbin/httpd
apache 10298 0.0 0.1 210820 2068 ? S May27 0:00 /usr/sbin/httpd
apache 10303 0.0 0.1 210820 2068 ? S May27 0:00 /usr/sbin/httpd
kcv478 14260 0.0 0.1 88736 1712 ? S 01:07 0:00 sshd: kcv478@pts/0
kcv478 14864 0.0 0.1 88736 1688 ? S 01:18 0:00 sshd: kcv478@pts/1
kcv478 16195 0.0 0.1 88736 1684 ? S 01:39 0:00 sshd: kcv478@pts/2
root 30621 0.0 0.1 210688 1584 ? Ss May11 5:49 /usr/sbin/httpd
kcv478 14261 0.0 0.1 64316 1584 pts/0 Ss+ 01:07 0:00 -bash
kcv478 16197 0.0 0.1 64316 1584 pts/2 Ss 01:39 0:00 -bash
kcv478 14865 0.0 0.1 64316 1576 pts/1 Ss+ 01:18 0:00 -bash
root 10285 0.0 0.1 91856 1368 ? Sl May27 0:00 PassengerHelperAgent
nobody 10290 0.0 0.1 49844 1240 ? Sl May27 0:00 PassengerLoggingAgent
root 1679 0.0 0.1 158436 1104 ? Sl Apr17 6:02 /usr/sbin/snmpd -Lsd -Lf /dev/null -p /var/run/sn
kcv478 17354 0.0 0.0 63824 888 pts/2 R+ 01:57 0:00 ps aux --sort -rss,-vsz
root 1774 0.0 0.0 67864 868 ? Ss Apr17 6:06 sendmail: accepting connections
root 1926 0.0 0.0 11040 804 ? SN Apr17 0:14 /usr/libexec/gam_server
avahi 1893 0.0 0.0 22508 732 ? Ss Apr17 0:00 avahi-daemon: running [configfun.local]
nscd 1663 0.0 0.0 195100 724 ? Ssl Apr17 24:08 /usr/sbin/nscd
smmsp 1782 0.0 0.0 56032 632 ? Ss Apr17 5:40 sendmail: Queue runner@01:00:00 for /var/spool/cl
root 10282 0.0 0.0 15296 632 ? Ssl May27 0:00 PassengerWatchdog
dbus 1463 0.0 0.0 21360 620 ? Ss Apr17 0:03 dbus-daemon --system
root 1717 0.0 0.0 64060 544 ? S Apr17 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/m
root 1862 0.0 0.0 72940 480 ? Ss Apr17 5:56 crond
root 1769 0.0 0.0 58900 476 ? S Apr17 0:00 logger -p daemon.err -t mysqld_safe -i -t mysqld
root 1915 0.0 0.0 3804 444 ? Ss+ Apr17 0:00 /sbin/agetty xvc0 9600 vt100-nav
root 1297 0.0 0.0 92864 440 ? S<sl Apr17 30:04 auditd
root 1646 0.0 0.0 121988 432 ? Ssl Apr17 12:02 automount
68 1511 0.0 0.0 30480 368 ? Ss Apr17 0:00 hald
nagios 1744 0.0 0.0 38304 344 ? Ss Apr17 0:03 nrpe -c /etc/nagios/nrpe.cfg -d
root 1877 0.0 0.0 17068 340 ? Ss Apr17 5:52 /usr/sbin/atd
root 1319 0.0 0.0 5912 316 ? Ss Apr17 6:10 syslogd -m 0
root 1691 0.0 0.0 61640 312 ? Ss Apr17 0:00 /usr/sbin/sshd
root 1299 0.0 0.0 81804 272 ? S<sl Apr17 36:40 /sbin/audispd
root 1377 0.0 0.0 10764 268 ? Ss Apr17 6:07 irqbalance
root 1912 0.0 0.0 18420 260 ? S Apr17 5:23 /usr/sbin/smartd -q never
root 1498 0.0 0.0 21044 196 ? Ssl Apr17 6:09 pcscd
root 1322 0.0 0.0 3808 196 ? Ss Apr17 5:55 klogd -x
avahi 1894 0.0 0.0 21484 120 ? Ss Apr17 0:00 avahi-daemon: chroot helper
root 1 0.0 0.0 10352 84 ? Ss Apr17 5:38 init [3]
root 1512 0.0 0.0 21700 8 ? S Apr17 0:00 hald-runner
root 1449 0.0 0.0 55184 4 ? Ss Apr17 0:00 rpc.idmapd
root 372 0.0 0.0 12608 4 ? S<s Apr17 0:00 /sbin/udevd -d
root 1427 0.0 0.0 10164 4 ? Ss Apr17 0:00 rpc.statd
root 1549 0.0 0.0 8520 4 ? Ss Apr17 0:00 /usr/bin/hidd --server
rpc 1387 0.0 0.0 8056 4 ? Ss Apr17 0:00 portmap
root 2 0.0 0.0 0 0 ? S< Apr17 0:04 [migration/0]
kcv478 15152 0.0 9.3 183640 97556 ? S 01:23 0:01 Rails: /rails_apps/babble_portal/releases/manuall
kcv478 10880 0.0 6.1 148424 64632 ? S 00:01 0:00 delayed_job
kcv478 4607 0.0 6.0 148412 63196 ? S May10 7:03 delayed_job
kcv478 20865 0.0 6.0 148260 63168 ? S May07 7:00 delayed_job
kcv478 10025 0.0 6.0 148432 63160 ? S May09 7:10 delayed_job
kcv478 31266 0.0 6.0 148412 63160 ? S May11 6:59 delayed_job
kcv478 21174 0.0 6.0 148372 63124 ? S May04 7:10 delayed_job
kcv478 8578 0.0 6.0 148340 63080 ? S Apr28 7:43 delayed_job
kcv478 25447 0.0 5.9 147764 62648 ? S May07 7:05 delayed_job
kcv478 15089 0.0 5.9 141180 62220 ? S Apr27 7:46 delayed_job
root 10287 0.0 0.8 38064 8820 ? S May27 0:00 Passenger spawn server
ntp 1705 0.0 0.5 19620 5316 ? SLs Apr17 3:36 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root 1924 0.0 0.4 256936 4708 ? SN Apr17 0:28 /usr/bin/python -tt /usr/sbin/yum-updatesd
root 14197 0.0 0.3 88596 3236 ? Ss 01:06 0:00 sshd: kcv478 [priv]
root 14862 0.0 0.3 88596 3236 ? Ss 01:18 0:00 sshd: kcv478 [priv]
root 16184 0.0 0.3 88596 3236 ? Ss 01:39 0:00 sshd: kcv478 [priv]
apache 10297 0.0 0.2 210820 2596 ? S May27 0:00 /usr/sbin/httpd
mysql 1768 0.0 0.2 306612 2140 ? Sl Apr17 12:16 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/m
apache 10301 0.0 0.2 210820 2128 ? S May27 0:00 /usr/sbin/httpd
apache 10299 0.0 0.1 210820 2092 ? S May27 0:00 /usr/sbin/httpd
apache 10300 0.0 0.1 210820 2092 ? S May27 0:00 /usr/sbin/httpd
apache 10302 0.0 0.1 210820 2084 ? S May27 0:00 /usr/sbin/httpd
apache 10304 0.0 0.1 210820 2072 ? S May27 0:00 /usr/sbin/httpd
apache 10298 0.0 0.1 210820 2068 ? S May27 0:00 /usr/sbin/httpd
apache 10303 0.0 0.1 210820 2068 ? S May27 0:00 /usr/sbin/httpd
kcv478 14260 0.0 0.1 88736 1712 ? S 01:07 0:00 sshd: kcv478@pts/0
kcv478 14864 0.0 0.1 88736 1688 ? S 01:18 0:00 sshd: kcv478@pts/1
kcv478 16195 0.0 0.1 88736 1684 ? S 01:39 0:00 sshd: kcv478@pts/2
root 30621 0.0 0.1 210688 1584 ? Ss May11 5:49 /usr/sbin/httpd
kcv478 14261 0.0 0.1 64316 1584 pts/0 Ss+ 01:07 0:00 -bash
kcv478 16197 0.0 0.1 64316 1584 pts/2 Ss 01:39 0:00 -bash
kcv478 14865 0.0 0.1 64316 1576 pts/1 Ss+ 01:18 0:00 -bash
root 10285 0.0 0.1 91856 1368 ? Sl May27 0:00 PassengerHelperAgent
nobody 10290 0.0 0.1 49844 1240 ? Sl May27 0:00 PassengerLoggingAgent
root 1679 0.0 0.1 158436 1104 ? Sl Apr17 6:02 /usr/sbin/snmpd -Lsd -Lf /dev/null -p /var/run/sn
kcv478 17354 0.0 0.0 63824 888 pts/2 R+ 01:57 0:00 ps aux --sort -rss,-vsz
root 1774 0.0 0.0 67864 868 ? Ss Apr17 6:06 sendmail: accepting connections
root 1926 0.0 0.0 11040 804 ? SN Apr17 0:14 /usr/libexec/gam_server
avahi 1893 0.0 0.0 22508 732 ? Ss Apr17 0:00 avahi-daemon: running [configfun.local]
nscd 1663 0.0 0.0 195100 724 ? Ssl Apr17 24:08 /usr/sbin/nscd
smmsp 1782 0.0 0.0 56032 632 ? Ss Apr17 5:40 sendmail: Queue runner@01:00:00 for /var/spool/cl
root 10282 0.0 0.0 15296 632 ? Ssl May27 0:00 PassengerWatchdog
dbus 1463 0.0 0.0 21360 620 ? Ss Apr17 0:03 dbus-daemon --system
root 1717 0.0 0.0 64060 544 ? S Apr17 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/m
root 1862 0.0 0.0 72940 480 ? Ss Apr17 5:56 crond
root 1769 0.0 0.0 58900 476 ? S Apr17 0:00 logger -p daemon.err -t mysqld_safe -i -t mysqld
root 1915 0.0 0.0 3804 444 ? Ss+ Apr17 0:00 /sbin/agetty xvc0 9600 vt100-nav
root 1297 0.0 0.0 92864 440 ? S<sl Apr17 30:04 auditd
root 1646 0.0 0.0 121988 432 ? Ssl Apr17 12:02 automount
68 1511 0.0 0.0 30480 368 ? Ss Apr17 0:00 hald
nagios 1744 0.0 0.0 38304 344 ? Ss Apr17 0:03 nrpe -c /etc/nagios/nrpe.cfg -d
root 1877 0.0 0.0 17068 340 ? Ss Apr17 5:52 /usr/sbin/atd
root 1319 0.0 0.0 5912 316 ? Ss Apr17 6:10 syslogd -m 0
root 1691 0.0 0.0 61640 312 ? Ss Apr17 0:00 /usr/sbin/sshd
root 1299 0.0 0.0 81804 272 ? S<sl Apr17 36:40 /sbin/audispd
root 1377 0.0 0.0 10764 268 ? Ss Apr17 6:07 irqbalance
root 1912 0.0 0.0 18420 260 ? S Apr17 5:23 /usr/sbin/smartd -q never
root 1498 0.0 0.0 21044 196 ? Ssl Apr17 6:09 pcscd
root 1322 0.0 0.0 3808 196 ? Ss Apr17 5:55 klogd -x
avahi 1894 0.0 0.0 21484 120 ? Ss Apr17 0:00 avahi-daemon: chroot helper
root 1 0.0 0.0 10352 84 ? Ss Apr17 5:38 init [3]
root 1512 0.0 0.0 21700 8 ? S Apr17 0:00 hald-runner
root 1449 0.0 0.0 55184 4 ? Ss Apr17 0:00 rpc.idmapd
root 372 0.0 0.0 12608 4 ? S<s Apr17 0:00 /sbin/udevd -d
root 1427 0.0 0.0 10164 4 ? Ss Apr17 0:00 rpc.statd
root 1549 0.0 0.0 8520 4 ? Ss Apr17 0:00 /usr/bin/hidd --server
rpc 1387 0.0 0.0 8056 4 ? Ss Apr17 0:00 portmap
root 2 0.0 0.0 0 0 ? S< Apr17 0:04 [migration/0]
5. 开始优化
5.1 15152 这个PID 排在第一位,不知道它是干啥的。总之看起来跟我的RAILS启动方式无关(passenger + apache ), 路径确实是目前部署的应用的所在路径,所以直接把它停止。
$ kill -9 15152
之后, 人肉测试一下,没有影响到当前的应用。
top:
top - 02:24:07 up 40 days, 9:58, 3 users, load average: 0.07, 0.05, 0.01
Tasks: 121 total, 2 running, 119 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1048576k total, 979032k used, 69544k free, 8008k buffers
Swap: 1052248k total, 363172k used, 689076k free, 134872k cached
不过。。。发现它又上来了。。。
ps aux --sort -rss,-vsz
引用
[kcv478@configfun ~]$ ps aux --sort -rss,-vsz
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kcv478 19037 1.0 9.5 186196 100076 ? S 02:23 0:03 Rails: /rails_apps/babble_portal/releases/manuall
kcv478 19034 1.3 6.4 152392 67504 ? S 02:23 0:04 Passenger ApplicationSpawner: /rails_apps/babble_
kcv478 10880 0.0 6.1 148424 64632 ? S 00:01 0:00 delayed_job
kcv478 4607 0.0 6.0 148412 63196 ? S May10 7:03 delayed_job
kcv478 25447 0.0 6.0 148292 63172 ? S May07 7:06 delayed_job
kcv478 20865 0.0 6.0 148260 63168 ? S May07 7:00 delayed_job
kcv478 10025 0.0 6.0 148432 63160 ? S May09 7:10 delayed_job
kcv478 31266 0.0 6.0 148412 63160 ? S May11 6:59 delayed_job
kcv478 21174 0.0 6.0 148372 63124 ? S May04 7:10 delayed_job
kcv478 8578 0.0 6.0 148340 63080 ? S Apr28 7:43 delayed_job
kcv478 15089 0.0 5.9 141180 62220 ? S Apr27 7:46 delayed_job
root 10287 0.0 0.8 38072 8868 ? S May27 0:00 Passenger spawn server
看来它是正常工作的进程。嘿嘿。。。
继续。。。
5.2 9 个delayed_job 进程
我每次部署的时候,都忘记停止当前应用的进程,而是直接把当前应用删掉,然后在开启一个新进程,会不会现在的问题跟它有关?
我先看看目前的DELAYED——JOB的正常的 PID:
(15152, 10880) 一共是两个进程,(其中一个15152 已经被我提前删掉了)
把它们都kill 之后,
引用
top - 02:33:44 up 40 days, 10:08, 3 users, load average: 0.10, 0.06, 0.01
Tasks: 119 total, 1 running, 118 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.7%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1048576k total, 859788k used, 188788k free, 9292k buffers
Swap: 1052248k total, 363164k used, 689084k free, 129688k cached
Tasks: 119 total, 1 running, 118 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.7%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1048576k total, 859788k used, 188788k free, 9292k buffers
Swap: 1052248k total, 363164k used, 689084k free, 129688k cached
非常好,系统物理内存变成了: 859788k used, 188788k free,
再看看 ps aux ,不错,变成了8个delayed_job了。它们都是没用的。
引用
[kcv478@configfun ~]$ ps aux --sort -rss,-vsz
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kcv478 19037 0.5 9.5 186196 100076 ? S 02:23 0:03 Rails: /rails_apps/babble_portal/releases/manuall
kcv478 4607 0.0 6.0 148412 63196 ? S May10 7:03 delayed_job
kcv478 25447 0.0 6.0 148292 63172 ? S May07 7:06 delayed_job
kcv478 20865 0.0 6.0 148260 63168 ? S May07 7:00 delayed_job
kcv478 10025 0.0 6.0 148432 63160 ? S May09 7:10 delayed_job
kcv478 21174 0.0 6.0 148372 63124 ? S May04 7:10 delayed_job
kcv478 31266 0.0 6.0 148288 63124 ? S May11 6:59 delayed_job
kcv478 8578 0.0 6.0 148340 63080 ? S Apr28 7:43 delayed_job
kcv478 15089 0.0 5.9 141180 62220 ? S Apr27 7:46 delayed_job
root 10287 0.0 0.8 38064 8900 ? S May27 0:00 Passenger spawn server
5.3 把剩下8个没用的 delayed_job 进程停止掉。
现在 top 一下,帅呆了。。。释放了 680 MB内存。。。
top - 02:42:44 up 40 days, 10:17, 3 users, load average: 0.00, 0.00, 0.00 Tasks: 111 total, 2 running, 109 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1048576k total, 366628k used, 681948k free, 12708k buffers Swap: 1052248k total, 57936k used, 994312k free, 129896k cached
看看 ps 的结果:
引用
[kcv478@configfun ~]$ ps aux --sort -rss,-vsz
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kcv478 19037 0.2 9.5 186196 100076 ? S 02:23 0:03 Rails: /rails_apps/babble_portal/releases/manuall
root 10287 0.0 0.8 38064 8900 ? S May27 0:00 Passenger spawn server
ntp 1705 0.0 0.5 19620 5316 ? SLs Apr17 3:36 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root 1924 0.0 0.4 256936 4708 ? SN Apr17 0:28 /usr/bin/python -tt /usr/sbin/yum-updatesd
root 14197 0.0 0.3 88596 3236 ? Ss 01:06 0:00 sshd: kcv478 [priv]
root 14862 0.0 0.3 88596 3236 ? Ss 01:18 0:00 sshd: kcv478 [priv]
root 16184 0.0 0.3 88596 3236 ? Ss 01:39 0:00 sshd: kcv478 [priv]
apache 10297 0.0 0.2 210820 3100 ? S May27 0:00 /usr/sbin/httpd
apache 19039 0.0 0.2 210820 2684 ? S 02:23 0:00 /usr/sbin/httpd
apache 10301 0.0 0.2 210820 2620 ? S May27 0:00 /usr/sbin/httpd
apache 10302 0.0 0.2 210820 2600 ? S May27 0:00 /usr/sbin/httpd
apache 10304 0.0 0.2 210820 2576 ? S May27 0:00 /usr/sbin/httpd
apache 10298 0.0 0.2 210820 2568 ? S May27 0:00 /usr/sbin/httpd
mysql 1768 0.0 0.2 306612 2140 ? Sl Apr17 12:16 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/m
apache 10300 0.0 0.2 210820 2140 ? S May27 0:00 /usr/sbin/httpd
apache 10299 0.0 0.2 210820 2100 ? S May27 0:00 /usr/sbin/httpd
apache 10303 0.0 0.1 210820 2092 ? S May27 0:00 /usr/sbin/httpd
kcv478 14260 0.0 0.1 88736 1712 ? S 01:07 0:00 sshd: kcv478@pts/0
kcv478 16195 0.0 0.1 88736 1692 ? S 01:39 0:00 sshd: kcv478@pts/2
kcv478 14864 0.0 0.1 88736 1688 ? S 01:18 0:00 sshd: kcv478@pts/1
root 30621 0.0 0.1 210688 1624 ? Ss May11 5:49 /usr/sbin/httpd
root 10285 0.0 0.1 92196 1588 ? Sl May27 0:01 PassengerHelperAgent
kcv478 14261 0.0 0.1 64316 1588 pts/0 Ss+ 01:07 0:00 -bash
kcv478 16197 0.0 0.1 64316 1588 pts/2 Ss 01:39 0:00 -bash
kcv478 14865 0.0 0.1 64316 1584 pts/1 Ss+ 01:18 0:00 -bash
nobody 10290 0.0 0.1 49844 1240 ? Sl May27 0:00 PassengerLoggingAgent
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
kcv478 19037 0.2 9.5 186196 100076 ? S 02:23 0:03 Rails: /rails_apps/babble_portal/releases/manuall
root 10287 0.0 0.8 38064 8900 ? S May27 0:00 Passenger spawn server
ntp 1705 0.0 0.5 19620 5316 ? SLs Apr17 3:36 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root 1924 0.0 0.4 256936 4708 ? SN Apr17 0:28 /usr/bin/python -tt /usr/sbin/yum-updatesd
root 14197 0.0 0.3 88596 3236 ? Ss 01:06 0:00 sshd: kcv478 [priv]
root 14862 0.0 0.3 88596 3236 ? Ss 01:18 0:00 sshd: kcv478 [priv]
root 16184 0.0 0.3 88596 3236 ? Ss 01:39 0:00 sshd: kcv478 [priv]
apache 10297 0.0 0.2 210820 3100 ? S May27 0:00 /usr/sbin/httpd
apache 19039 0.0 0.2 210820 2684 ? S 02:23 0:00 /usr/sbin/httpd
apache 10301 0.0 0.2 210820 2620 ? S May27 0:00 /usr/sbin/httpd
apache 10302 0.0 0.2 210820 2600 ? S May27 0:00 /usr/sbin/httpd
apache 10304 0.0 0.2 210820 2576 ? S May27 0:00 /usr/sbin/httpd
apache 10298 0.0 0.2 210820 2568 ? S May27 0:00 /usr/sbin/httpd
mysql 1768 0.0 0.2 306612 2140 ? Sl Apr17 12:16 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/m
apache 10300 0.0 0.2 210820 2140 ? S May27 0:00 /usr/sbin/httpd
apache 10299 0.0 0.2 210820 2100 ? S May27 0:00 /usr/sbin/httpd
apache 10303 0.0 0.1 210820 2092 ? S May27 0:00 /usr/sbin/httpd
kcv478 14260 0.0 0.1 88736 1712 ? S 01:07 0:00 sshd: kcv478@pts/0
kcv478 16195 0.0 0.1 88736 1692 ? S 01:39 0:00 sshd: kcv478@pts/2
kcv478 14864 0.0 0.1 88736 1688 ? S 01:18 0:00 sshd: kcv478@pts/1
root 30621 0.0 0.1 210688 1624 ? Ss May11 5:49 /usr/sbin/httpd
root 10285 0.0 0.1 92196 1588 ? Sl May27 0:01 PassengerHelperAgent
kcv478 14261 0.0 0.1 64316 1588 pts/0 Ss+ 01:07 0:00 -bash
kcv478 16197 0.0 0.1 64316 1588 pts/2 Ss 01:39 0:00 -bash
kcv478 14865 0.0 0.1 64316 1584 pts/1 Ss+ 01:18 0:00 -bash
nobody 10290 0.0 0.1 49844 1240 ? Sl May27 0:00 PassengerLoggingAgent
很好,现在运行的进程, RSS基本都在 10MB以下。 先这样吧 :)
总结:
本以为会是一个很麻烦的过程(卸载不必要的程序啊,调试啊,等等)现在看起来,原因就在于没有及时的把原来跑着的delayed_job停止。 把它们删掉之后就问题解决了。。:-)。如此简单。
相关推荐
Oracle 11gR2 for Linux 的静默安装是一个复杂的过程,涉及到多个步骤,包括软件上传、包安装检查、创建数据库用户和组、调整操作系统内核参数以及优化shell限制。以下是对这些步骤的详细说明: 1. **上传软件**: ...
为了优化Oracle数据库的性能,需要调整系统的内核参数,包括共享内存、信号量、文件句柄限制和网络参数。具体参数及其含义如下: - `kernel.shmall`:定义了可用共享内存的总数量,单位为页(每页4KB)。如果服务器...
为了安全性和性能优化,需要调整`ulimit`设置,通常涉及到`nofile`和`nproc`限制,可以通过`/etc/security/limits.conf`文件进行修改。 6. **创建Oracle存放的文件夹**: 创建用于存储Oracle数据文件、日志文件和...
- **性能和伸缩性**:优化的运行时环境提供高性能,并适应不同规模的应用需求。 3. **开发第一个C#程序**: - 示例程序展示了如何使用C#输出"Hello World!"和当前日期时间,这通常是一个编程初学者的第一个程序。...
│ │ │ MySql5.6性能优化.docx │ │ │ │ │ ├─Redis集群 │ │ │ redis-3.0.2.tar.gz │ │ │ redis-3.2.1.gem │ │ │ Redis集群.docx │ │ │ 集群架构.eddx │ │ │ │ │ └─Tomcat7优化 │ │ ...
内存的大小影响着计算机的运行速度,单位从1MB到几十GB不等。硬盘用于长期存储数据,容量可达数百GB。缓存(一级或二级)加速了数据的读取。光驱如CD-ROM或DVD,有不同的传输速度。显卡负责图像处理,包括显示内存...
- **容量**:包括内存和外存,常用单位有K、M、G等。 - **数据传输率**:通常以bps为单位,表示每秒传输的比特数。 - **可靠性**:指计算机系统的稳定性和故障恢复能力。 4. **微处理器的技术特点** - **奔腾...
MySQL支持多种操作系统,如Windows、Linux等,并且可以运行在各种规模的服务器上,从小型个人计算机到大型企业级服务器均可。 ##### 1.2 获取MySQL及获取帮助 - **下载MySQL**:可以从MySQL官方网站下载最新版本的...