Understanding CMS GC Logs
By Poonam-Oracle on Mar 23, 2006
CMS GC with -XX:+PrintGCDetails and -XX:+PrintGCTimeStamps prints a lot of information. Understanding this information can help in fine tuning various parameters of the application and CMS to achieve best performance.
Let's have a look at some of the CMS logs generated with 1.4.2_10:
39.910: [GC 39.910: [ParNew: 261760K->0K(261952K), 0.2314667 secs] 262017K->26386K(1048384K), 0.2318679 secs]
Young generation (ParNew) collection. Young generation capacity is 261952K and after the collection its occupancy drops down from 261760K to 0. This collection took 0.2318679 secs.
40.146: [GC [1 CMS-initial-mark: 26386K(786432K)] 26404K(1048384K), 0.0074495 secs]
Beginning of tenured generation collection with CMS collector. This is initial Marking phase of CMS where all the objects directly reachable from roots are marked and this is done with all the mutator threads stopped.
Capacity of tenured generation space is 786432K and CMS was triggered at the occupancy of 26386K.
40.154: [CMS-concurrent-mark-start]
Start of concurrent marking phase.
In Concurrent Marking phase, threads stopped in the first phase are started again and all the objects transitively reachable from the objects marked in first phase are marked here.
40.683: [CMS-concurrent-mark: 0.521/0.529 secs]
Concurrent marking took total 0.521 seconds cpu time and 0.529 seconds wall time that includes the yield to other threads also.
40.683: [CMS-concurrent-preclean-start]
Start of precleaning.
Precleaning is also a concurrent phase. Here in this phase we look at the objects in CMS heap which got updated by promotions from young generation or new allocations or got updated by mutators while we were doing the concurrent marking in the previous concurrent marking phase. By rescanning those objects concurrently, the precleaning phase helps reduce the work in the next stop-the-world “remark” phase.
40.701: [CMS-concurrent-preclean: 0.017/0.018 secs]
Concurrent precleaning took 0.017 secs total cpu time and 0.018 wall time.
40.704: [GC40.704: [Rescan (parallel) , 0.1790103 secs]40.883: [weak refs processing, 0.0100966 secs] [1 CMS-remark: 26386K(786432K)] 52644K(1048384K), 0.1897792 secs]
Stop-the-world phase. This phase rescans any residual updated objects in CMS heap, retraces from the roots and also processes Reference objects. Here the rescanning work took 0.1790103 secs and weak reference objects processing took 0.0100966 secs. This phase took total 0.1897792 secs to complete.
40.894: [CMS-concurrent-sweep-start]
Start of sweeping of dead/non-marked objects. Sweeping is concurrent phase performed with all other threads running.
41.020: [CMS-concurrent-sweep: 0.126/0.126 secs]
Sweeping took 0.126 secs.
41.020: [CMS-concurrent-reset-start]
Start of reset.
41.147: [CMS-concurrent-reset: 0.127/0.127 secs]
In this phase, the CMS data structures are reinitialized so that a new cycle may begin at a later time. In this case, it took 0.127 secs.
This was how a normal CMS cycle runs. Now let us look at some other CMS log entries:
197.976: [GC 197.976: [ParNew: 260872K->260872K(261952K), 0.0000688 secs]197.976: [CMS197.981: [CMS-concurrent-sweep: 0.516/0.531 secs]
(concurrent mode failure): 402978K->248977K(786432K), 2.3728734 secs] 663850K->248977K(1048384K), 2.3733725 secs]
This shows that a ParNew collection was requested, but it was not attempted because it was estimated that there was not enough space in the CMS generation to promote the worst case surviving young generation objects. We name this failure as “full promotion guarantee failure”.
Due to this, Concurrent Mode of CMS is interrupted and a Full GC is invoked at 197.981. This mark-sweep-compact stop-the-world Full GC took 2.3733725 secs and the CMS generation space occupancy dropped from 402978K to 248977K.
The concurrent mode failure can either be avoided by increasing the tenured generation size or initiating the CMS collection at a lesser heap occupancy by setting CMSInitiatingOccupancyFraction to a lower value and setting UseCMSInitiatingOccupancyOnly to true. The value for CMSInitiatingOccupancyFraction should be chosen appropriately because setting it to a very low value will result in too frequent CMS collections.
Sometimes we see these promotion failures even when the logs show that there is enough free space in tenured generation. The reason is 'fragmentation' - the free space available in tenured generation is not contiguous, and promotions from young generation require a contiguous free block to be available in tenured generation. CMS collector is a non-compacting collector, so can cause fragmentation of space for some type of applications. In his blog, Jon talks in detail on how to deal with this fragmentation problem:
http://blogs.sun.com/roller/page/jonthecollector?entry=when_the_sum_of_the
Starting with 1.5, for the CMS collector, the promotion guarantee check is done differently. Instead of assuming that the promotions would be worst case i.e. all of the surviving young generation objects would get promoted into old gen, the expected promotion is estimated based on recent history of promotions. This estimation is usually much smaller than the worst case promotion and hence requires less free space to be available in old generation. And if the promotion in a scavenge attempt fails, then the young generation is left in a consistent state and a stop-the-world mark-compact collection is invoked. To get the same functionality with UseSerialGC you need to explicitly specify the switch -XX:+HandlePromotionFailure.
283.736: [Full GC 283.736: [ParNew: 261599K->261599K(261952K), 0.0000615 secs] 826554K->826554K(1048384K), 0.0003259 secs]
GC locker: Trying a full collection because scavenge failed
283.736: [Full GC 283.736: [ParNew: 261599K->261599K(261952K), 0.0000288 secs]
Stop-the-world GC happening when a JNI Critical section is released. Here again the young generation collection failed due to “full promotion guarantee failure” and then the Full GC is being invoked.
CMS can also be run in incremental mode (i-cms), enabled with -XX:+CMSIncrementalMode. In this mode, CMS collector does not hold the processor for the entire long concurrent phases but periodically stops them and yields the processor back to other threads in application. It divides the work to be done in concurrent phases in small chunks(called duty cycle) and schedules them between minor collections. This is very useful for applications that need low pause times and are run on machines with small number of processors.
Some logs showing the incremental CMS.
2803.125: [GC 2803.125: [ParNew: 408832K->0K(409216K), 0.5371950 secs] 611130K->206985K(1048192K) icms_dc=4 , 0.5373720 secs]
2824.209: [GC 2824.209: [ParNew: 408832K->0K(409216K), 0.6755540 secs] 615806K->211897K(1048192K) icms_dc=4 , 0.6757740 secs]
Here, the scavenges took respectively 537 ms and 675 ms. In between these two scavenges, iCMS ran for a brief period as indicated by the icms_dc value, which indicates a duty-cycle. In this case the duty cycle was 4%. A simple calculation shows that the iCMS incremental step lasted for 4/100 \* (2824.209 - 2803.125 - 0.537) = 821 ms, i.e. 4% of the time between the two scavenges.
Starting with 1.5, CMS has one more phase – concurrent abortable preclean. Abortable preclean is run between a 'concurrent preclean' and 'remark' until we have the desired occupancy in eden. This phase is added to help schedule the 'remark' phase so as to avoid back-to-back pauses for a scavenge closely followed by a CMS remark pause. In order to maximally separate a scavenge from a CMS remark pause, we attempt to schedule the CMS remark pause roughly mid-way between scavenges.
There is a second reason why we do this. Immediately following a scavenge there are likely a large number of grey objects that need rescanning. The abortable preclean phase tries to deal with such newly grey objects thus reducing a subsequent CMS remark pause.
The scheduling of 'remark' phase can be controlled by two jvm options CMSScheduleRemarkEdenSizeThreshold and CMSScheduleRemarkEdenPenetration. The defaults for these are 2m and 50% respectively. The first parameter determines the Eden size below which no attempt is made to schedule the CMS remark pause because the pay off is expected to be minuscule. The second parameter indicates the Eden occupancy at which a CMS remark is attempted.
After 'concurrent preclean' if the Eden occupancy is above CMSScheduleRemarkEdenSizeThreshold, we start 'concurrent abortable preclean' and continue precleanig until we have CMSScheduleRemarkEdenPenetration percentage occupancy in eden, otherwise we schedule 'remark' phase immediately.
7688.150: [CMS-concurrent-preclean-start]
7688.186: [CMS-concurrent-preclean: 0.034/0.035 secs]
7688.186: [CMS-concurrent-abortable-preclean-start]
7688.465: [GC 7688.465: [ParNew: 1040940K->1464K(1044544K), 0.0165840 secs] 1343593K->304365K(2093120K), 0.0167509 secs]
7690.093: [CMS-concurrent-abortable-preclean: 1.012/1.907 secs]
7690.095: [GC[YG occupancy: 522484 K (1044544 K)]7690.095: [Rescan (parallel) , 0.3665541 secs]7690.462: [weak refs processing, 0.0003850 secs] [1 CMS-remark: 302901K(1048576K)] 825385K(2093120K), 0.3670690 secs]
In the above log, after a preclean, 'abortable preclean' starts. After the young generation collection, the young gen occupancy drops down from 1040940K to 1464K. When young gen occupancy reaches 522484K which is 50% of the total capacity, precleaning is aborted and 'remark' phase is started.
Note that in 1.5, young generation occupancy also gets printed in the final remark phase.
For more detailed information and tips on GC tuning, please refer to the following documents:
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
http://java.sun.com/docs/hotspot/gc1.4.2/
原文链接:
https://blogs.oracle.com/poonam/entry/understanding_cms_gc_logs
笔者对其中某几条记录又进行了详细说明,以下是一条完整的CMS日志记录的示例,外加一些内存分配错误的情况。
看之前最好对GC的新生代(Young Generation)和老生代(Old Generation)以及晋升过程(Promotion)都有深刻理解,本文帮助理解GC CMS日志,对JVM调优没有进一步说明。
82551.569:[GC [1 CMS-initial-mark: 2027280K(2516992K)] 2084513K(3088576K), 0.0344000secs] [Times: user=0.03 sys=0.01, real=0.03 secs]
第一阶段:初始标记阶段(Initial mark)标志着CMS收集老生代(Old Generation)的开始,所有从根部直接可达的对象会被标记,此时其他线程被阻断,这个阶段称为stop-the-world。此记录中,老生代的大小是2516992K,CMS在占用内存达到2027280K时触发,初始标记引起的pause time是0.0344s。
82551.604:[CMS-concurrent-mark-start]
第二阶段:并发标记阶段(concurrent mark),所有第一个阶段被停掉的线程重新启动,此阶段内,从第一阶段标记对象出发所有间接可达的对象将被标记。
82553.887:[CMS-concurrent-mark: 2.272/2.283 secs] [Times: user=5.27 sys=0.14, real=2.29secs]
并发标记消耗2.272s CPU时间和2.283s 实际时间,实际时间包含CPU时间和阻断其他线程的时间。
82553.887:[CMS-concurrent-preclean-start]
第三阶段:并发预清理阶段(concurrent preclean),目的是减轻下一个阶段:重标记(remark)的工作量,因为预清理是并发的,而重标记是stop-the-world的,这样可以减小对其他线程的影响。此阶段内,收集器查看在并发标记过程中,CMS堆内得到晋升(promotion)的对象。
82553.928:[CMS-concurrent-preclean: 0.040/0.041 secs] [Times: user=0.04 sys=0.00,real=0.04 secs]
并发预清理消耗0.040s CPU时间和0.041s实际时间。
82553.929:[CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 82558.942: [CMS-concurrent-abortable-preclean: 1.311/5.014secs] [Times: user=1.41 sys=0.05, real=5.01 secs]
82558.959:[GC[YG occupancy: 338653 K (571584 K)]82558.959: [Rescan (parallel) , 0.3058990secs]82559.265: [weak refs processing, 0.0667480 secs]82559.332: [classunloading, 0.0868270 secs]82559.419: [scrub symbol & string tables,0.1176240 secs] [1 CMS-remark:
2027283K(2516992K)] 2365936K(3088576K),0.6602340 secs] [Times: user=4.88 sys=0.37, real=0.66 secs]
第四阶段:经过了并发预清理阶段,可切断式预清理(abortable preclean)开始了(笔者不确定这么翻译是否合理)。从上面的记录可以看出,新生代的容量是571584K,新生代占有内存达到338653K时此预清理过程就被切断了,重标记阶段开始,由于重标记是stop-the-world的,所以其他线程被阻断。
第五阶段:重标记(remark)阶段,此阶段重新扫描CMS堆中剩余的且状态更新过的对象,重新从根部遍历,并且执行被引用的对象。这里,重扫描消耗0.3058990s,弱引用对象执行消耗0.667480s,重标记总体消耗了0.6602340s。
需要说明的是:如果新生代中Eden的占有内存达到了参数XX:CMSScheduleRemarkEdenSizeThreschold=<n>的值,可切断式预清理就会启动,直到Eden占有内存达到参数XX:CMSScheduleRemarkEdenPenetration=<n>才会结束,所以说它是可以被打断的。如果它执行的时间超过了参数XX:CMSMaxAbortablePrecleanTime的值,无论如何也是会立即停止的。以上这些参数是为了限制预清理执行时间过长,但是预清理减轻了重标记的工作量。
82559.619:[CMS-concurrent-sweep-start]
第六阶段:并发清理阶段,重标记过后CMS开始清理没有标记的也就是已经死亡的对象。这一过程不会阻断其他进程。所用时间如下一条记录所示。
82560.976:[CMS-concurrent-sweep: 1.320/1.357 secs] [Times: user=1.76 sys=0.23, real=1.36secs]
82560.976:[CMS-concurrent-reset-start]
82561.000:[CMS-concurrent-reset: 0.024/0.024 secs] [Times: user=0.02 sys=0.00, real=0.02secs]
第七阶段:重置阶段,CMS内数据再一次初始化,进入下一个清理循环,重置消耗0.024s。
下面是两种清理错误的情况:promotion failed和concurrentmode failure。
250169.767:[GC 250169.767: [ParNew (promotion failed): 571584K->571584K(571584K),0.6487910 secs]250170.416: [CMS250173.050: [CMS-concurrent-mark: 2.887/3.777 secs][Times: user=10.86 sys=0.56, real=3.78 secs]
(concurrentmode failure): 2268975K->2111899K(2516992K), 8.3732150 secs]2766660K->2111899K(3088576K), [CMS Perm : 562899K->562896K(1048576K)],9.0223120 secs] [Times: user=9.78 sys=0.28, real=9.02 secs]
promotion failure表示从新生代晋升到老生代时发生了错误,因为老生代内存占用快满了,所以放不下晋升上来的对象。
有时promotion failure会引起concurrentmode failure,原因还是老生代内存不够用了,这样就引起了Full GC,也就是记录中的CMS Perm,Full GC是一个stop-the-world的过程。
相关推荐
深入java虚拟机光盘资源jvm-gc-logs-analyzer 这个项目是一个 Java 虚拟机和垃圾收集器日志分析器。 它专用于 JVM 11 及更高版本(JVM 8 支持正在开发中)。 日志必须采用适当的格式和适当的装饰器,检查最后部分的...
【标题】"darks-logs 1.0.7" 是一个特定版本的软件日志管理框架,专门设计用于处理和记录应用中的日志信息。这个版本为1.0.7,意味着它可能包含了之前版本的改进和修复,提供更稳定、更高效的服务。 【描述】"darks...
可以直接用如下配置在docker-compose里启动: nacos: image: infras/nacos-server:v2.4.0 container_name: nacos ... - ./storage/logs/nacos/gclogs:/data/nacos/gclogs environment: - TZ=Asia/Shanghai
今天,我们将使用 Wireshark 分析 logs.pcapng 数据包文件,探索恶意用户的活动痕迹。 一、恶意用户目录扫描 在数据包文件 logs.pcapng 中,我们可以使用 Wireshark 查看恶意用户的目录扫描活动。通过使用过滤器 ...
Unity-Logs-Viewer-master.zip 是一个专门为Unity开发者设计的压缩包,其中包含了一个插件,用于在不依赖ADB(Android Debug Bridge)的情况下查看Unity在真机设备上的运行日志。这个插件对于开发者来说非常实用,...
《智能日志框架smart-logs_burstow5详解》 在现代软件开发中,日志框架扮演着至关重要的角色,它记录程序运行过程中的各种信息,帮助开发者追踪问题,优化性能,以及进行故障排查。本文将详细介绍名为"smart-logs_...
游戏日志数据集"game_logs.csv"是IT领域中用于数据分析和挖掘的一种常见资源,尤其在游戏行业,这种数据集非常有价值。它记录了玩家在游戏中的各种活动和行为,可以用于理解玩家行为模式,优化游戏设计,提高用户...
host_driver_logs_current.txt
logs.js
Unity-Logs-Viewer-master 是一个专门为Unity游戏开发者设计的工具,旨在简化真机调试过程,尤其是在iOS和Android设备上。这个工具允许开发者在游戏运行时实时查看Debug日志,极大地提高了开发效率和问题排查能力。 ...
Apache Logs Viewer是一款专门用于分析Apache服务器日志的实用工具,版本号为5.39,本次更新包含了 Patch-URET 的修复和改进。这个软件能够帮助IT管理员和开发者有效地理解和解析Apache服务器生成的日志文件,从中...
- **logs**:日志文件,记录系统的运行情况和错误信息。 电影网站通常需要具备的功能可能有: - 视频上传和管理:用户可以上传、编辑和删除视频,后台需要有视频处理能力,如转码、缩略图生成等。 - 分类和标签:...
"logs2intrusions" 是一个专门用于Web日志入侵检测的工具,它的核心功能是通过对Web服务器的日志数据进行分析,识别出潜在的入侵行为或安全威胁。在网络安全领域,这种工具至关重要,因为它们能够帮助管理员及时发现...
logs.csv(4).xlsx
Unity日志输出插件,在手机和pc上均有用,能很清晰的看到日志打印,非常好用。 使用这个工具,你可以很容易地在游戏内部检查你的编辑器控制台日志!在移动端同样有效! 支持unity2017.4.29 or 更高版本
删除tomcat下产生的日志文件,配合计划任务使用,会更加有效果呢.
阿射日志系统logs基于PHP MySQL平台开发,结构简洁,运行速度快,简体中文UTF-8编码。 安装方法: 1.将文件夹下所有的文件上传。 2.通过浏览器访问 ,填写配置信息进行安装。 3.安装完成,访问首页。 4.管理...