Java theory and practice: A brief history of garbage collection
- 博客分类:
- Performance
The benefits of garbage collection are indisputable -- increased reliability, decoupling of memory management from class interface design, and less developer time spent chasing memory management errors. The well-known problems of dangling pointers and memory leaks simply do not occur in Java programs. (Java programs can exhibit a form of memory leak, more accurately called unintentional object retention, but this is a different problem.) However, garbage collection is not without its costs -- among them performance impact, pauses, configuration complexity, and nondeterministic finalization.
An ideal garbage collection implementation would be totally invisible -- there would be no garbage collection pauses, no CPU time would be lost to garbage collection, the garbage collector wouldn't interact negatively with virtual memory or the cache, and the heap wouldn't need to be any larger than the residency (heap occupancy) of the application. Of course, there are no perfect garbage collectors, but garbage collectors have improved significantly over the past ten years.
The 1.3 JDK includes three different garbage collection strategies; the 1.4.1 JDK includes six, and over a dozen command-line options for configuring and tuning garbage collection. How do they differ? Why do we need so many?
The various garbage collection implementations use different strategies for identification and reclamation of unreachable objects, and they interact differently with the user program and scheduler. Different sorts of applications will have different requirements for garbage collection -- real-time applications will demand short and bounded-duration collection pauses, whereas enterprise applications may tolerate longer or less predictable pauses in favor of higher throughput.
How does garbage collection work?
There are several basic strategies for garbage collection: reference counting, mark-sweep, mark-compact, and copying. In addition, some algorithms can do their job incrementally (the entire heap need not be collected at once, resulting in shorter collection pauses), and some can run while the user program runs (concurrent collectors). Others must perform an entire collection at once while the user program is suspended (so-called stop-the-world collectors). Finally, there are hybrid collectors, such as the generational collector employed by the 1.2 and later JDKs, which use different collection algorithms on different areas of the heap.
When evaluating a garbage collection algorithm, we might consider any or all of the following criteria:
-
Pause time.
Does the collector stop the world to perform collection? For how long? Can pauses be bounded in time?
-
Pause predictability.
Can garbage collection pauses be
scheduled at times that are convenient for the user program, rather
than for the garbage collector?
-
CPU usage.
What percentage of the total available CPU time is spent in garbage collection?
-
Memory footprint.
Many garbage collection algorithms require
dividing the heap into separate memory spaces, some of which may be
inaccessible to the user program at certain times. This means that the
actual size of the heap may be several times bigger than the maximum
heap residency of the user program.
-
Virtual memory interaction.
On systems with limited physical
memory, a full garbage collection may fault nonresident pages into
memory to examine them during the collection process. Because the cost
of a page fault is high, it is desirable that a garbage collector
properly manage locality of reference.
-
Cache interaction.
Even on systems where the entire heap can
fit into main memory, which is true of virtually all Java applications,
garbage collection will often have the effect of flushing data used by
the user program out of the cache, imposing a performance cost on the
user program.
-
Effects on program locality.
While some believe that the job
of the garbage collector is simply to reclaim unreachable memory,
others believe that the garbage collector should also attempt to
improve the reference locality of the user program. Compacting and
copying collectors relocate objects during collection, which has the
potential to improve locality.
- Compiler and runtime impact. Some garbage collection algorithms require significant cooperation from the compiler or runtime environment, such as updating reference counts whenever a pointer assignment is performed. This creates both work for the compiler, which must generate these bookkeeping instructions, and overhead for the runtime environment, which must execute these additional instructions. What is the performance impact of these requirements? Does it interfere with compile-time optimizations?
Regardless of the algorithm chosen, trends in hardware and software have made garbage collection far more practical. Empirical studies from the 1970s and 1980s show garbage collection consuming between 25 percent and 40 percent of the runtime in large Lisp programs. While garbage collection may not yet be totally invisible, it sure has come a long way.
The problem faced by all garbage collection algorithms is the same -- identify blocks of memory that have been dispensed by the allocator, but are unreachable by the user program. What do we mean by unreachable? Memory blocks can be reached in one of two ways -- if the user program holds a reference to that block in a root , or if there is a reference to that block held in another reachable block. In a Java program, a root is a reference to an object held in a static variable or in a local variable on an active stack frame. The set of reachable objects is the transitive closure of the root set under the points-to relation.
The most straightforward garbage collection strategy is reference counting. Reference counting is simple, but requires significant assistance from the compiler and imposes overhead on the mutator (the term for the user program, from the perspective of the garbage collector). Each object has an associated reference count -- the number of active references to that object. If an object's reference count is zero, it is garbage (unreachable from the user program) and can be recycled. Every time a pointer reference is modified, such as through an assignment statement, or when a reference goes out of scope, the compiler must generate code to update the referenced object's reference count. If an object's reference count goes to zero, the runtime can reclaim the block immediately (and decrement the reference counts of any blocks that the reclaimed block references), or place it on a queue for deferred collection.
Many ANSI C++ library classes, such as string
, employ
reference counting to provide the appearance of garbage collection. By
overloading the assignment operator and exploiting the deterministic
finalization provided by C++ scoping, C++ programs can use the string
class as if it were garbage collected. Reference counting is simple,
lends itself well to incremental collection, and the collection process
tends to have good locality of reference, but it is rarely used in
production garbage collectors for a number of reasons, such as its
inability to reclaim unreachable cyclic structures (objects that
reference each other directly or indirectly, like a circularly linked
list or a tree that
contains back-pointers to the parent node).
None of the standard garbage collectors in the JDK uses reference counting; instead, they all use some form of tracing collector . A tracing collector stops the world (although not necessarily for the entire duration of the collection) and starts tracing objects, starting at the root set and following references until all reachable objects have been examined. Roots can be found in program registers, in local (stack-based) variables in each thread's stack, and in static variables.
The most basic form of tracing collector, first proposed by Lisp inventor John McCarthy in 1960, is the mark-sweep collector, in which the world is stopped and the collector visits each live node, starting from the roots, and marks each node it visits. When there are no more references to follow, collection is complete, and then the heap is swept (that is, every object in the heap is examined), and any object not marked is reclaimed as garbage and returned to the free list. Figure 1 illustrates a heap prior to garbage collection; the shaded blocks are garbage because they are unreachable by the user program:
Figure 1. Reachable and unreachable objects
Mark-sweep is simple to implement, can reclaim cyclic structures easily, and doesn't place any burden on the compiler or mutator like reference counting does. But it has deficiencies -- collection pauses can be long, and the entire heap is visited in the sweep phase, which can have very negative performance consequences on virtual memory systems where the heap may be paged.
The big problem with mark-sweep is that every active (that is, allocated) object, whether reachable or not, is visited during the sweep phase. Because a significant percentage of objects are likely to be garbage, this means that the collector is spending considerable effort examining and handling garbage. Mark-sweep collectors also tend to leave the heap fragmented, which can cause locality issues and can also cause allocation failures even when sufficient free memory appears to be available.
In a copying collector , another form of tracing collector, the heap is divided into two equally sized semi-spaces, one of which contains active data and the other is unused. When the active space fills up, the world is stopped and live objects are copied from the active space into the inactive space. The roles of the spaces are then flipped, with the old inactive space becoming the new active space.
Copying collection has the advantage of only visiting live objects, which means garbage objects will not be examined, nor will they need to be paged into memory or brought into the cache. The duration of collection cycles in a copying collector is driven by the number of live objects. However, copying collectors have the added cost of copying the data from one space to another, adjusting all references to point to the new copy. In particular, long-lived objects will be copied back and forth on every collection.
Copying collectors have another benefit, which is that the set of
live objects are compacted into the bottom of the heap. This not only
improves locality of reference of the user program and eliminates heap
fragmentation, but also greatly reduces the cost of object allocation
-- object allocation becomes a simple pointer addition on the
top-of-heap pointer. There is no need to maintain free lists or
look-aside lists, or perform best-fit or first-fit algorithms --
allocating N
bytes is as simple as adding N
to the top-of-heap pointer and returning
its previous value, as suggested in Listing 1:
Listing 1. Inexpensive memory allocation in a copying collector
void *malloc(int n) { if (heapTop - heapStart < n) doGarbageCollection(); void *wasStart = heapStart; heapStart += n; return wasStart; } |
Developers who have implemented sophisticated memory management schemes for non-garbage-collected languages may be surprised at how inexpensive allocation is -- a simple pointer addition -- in a copying collector. This may be one of the reasons for the pervasive belief that object allocation is expensive -- earlier JVM implementations did not use copying collectors, and developers are still implicitly assuming allocation cost is similar to other languages, like C, when in fact it may be significantly cheaper in the Java runtime. Not only is the cost of allocation smaller, but for objects that become garbage before the next collection cycle, the deallocation cost is zero, as the garbage object will be neither visited nor copied.
The copying algorithm has excellent performance characteristics, but it has the drawback of requiring twice as much memory as a mark-sweep collector. The mark-compact algorithm combines mark-sweep and copying in a way that avoids this problem, at the cost of some increased collection complexity. Like mark-sweep, mark-compact is a two-phase process, where each live object is visited and marked in the marking phase. Then, marked objects are copied such that all the live objects are compacted at the bottom of the heap. If a complete compaction is performed at every collection, the resulting heap is similar to the result of a copying collector -- there is a clear demarcation between the active portion of the heap and the free area, so that allocation costs are comparable to a copying collector. Long-lived objects tend to accumulate at the bottom of the heap, so they are not copied repeatedly as they are in a copying collector.
OK, so which of these approaches does the JDK take for garbage collection? In some sense, all of them. Early JDKs used a single-threaded mark-sweep or mark-sweep-compact collector. JDKs 1.2 and later employ a hybrid approach, called generational collection , where the heap is divided into several sections based on an object's age, and different generations are collected separately using different collection algorithms.
Generational garbage collection turns out to be very effective, although it introduces several additional bookkeeping requirements at runtime. In next month's Java theory and practice , we'll explore how generational garbage collection works and how it is employed by the 1.4.1 JVM, in addition to all the other garbage collection options offered by the 1.4.1 JVM. In the installment following that, we'll look at the performance impact of garbage collection, including debunking some performance myths related to memory management.
<!-- CMA ID: 10881 --> <!-- Site ID: 1 --><!-- XSLT stylesheet used to transform this file: dw-document-html-6.0.xsl-->
-
Garbage Collection: Algorithms for Automatic Dynamic Memory Management
(John Wiley & Sons, 1997) is a comprehensive survey of garbage
collection algorithms, with an extensive bibliography. The author,
Richard Jones, maintains an updated bibliography of nearly 2000 papers
on garbage collection on his Garbage Collection Page
.
- The IBM 1.4 Developer Kits for the Java platform use a mark-sweep-compact collector, which supports incremental compaction
to reduce pause times.
- The three-part series "Sensible sanitation -- Understanding the IBM Java Garbage Collector
" (developerWorks
,
August - September 2002) describes the garbage collection strategy
employed by the IBM 1.2 and 1.3 Developer Kits for the Java platform.
- "Fine-tuning Java garbage collection performance
" (developerWorks
, January 2003) describes how to detect and troubleshoot garbage collection issues.
- This article from IBM Systems Journal
describes some of the lessons learned
in building the IBM 1.1.x Developer Kits for the Java platform,
including the details of mark-sweep and mark-sweep-compact garbage
collection.
- The Garbage Collection mailing list maintains a garbage collection FAQ
.
- The Boost library, a library of very useful C++ classes, includes the shared_ptr
smart pointer class that demonstrates the use of reference counting to provide a rudimentary form of garbage collection in C++.
- You'll find hundreds of articles about every aspect of Java programming in the developerWorks
Java technology zone
.
发表评论
-
Java theory and practice: Garbage collection and performance
2009-08-14 23:33 1140In the early days of Java techn ... -
Removing GC Synchronisation
2009-08-14 23:15 769Removing GC Synchronisation ... -
Tuning Garbage Collection with the 1.4.2 Java[tm] Virtual Machine
2009-08-14 22:55 863Table of Contents ...
相关推荐
内容概要:本文详细介绍了参与西门子杯比赛中关于三部十层电梯系统的博图V15.1程序设计及其WinCC画面展示的内容。文中不仅展示了电梯系统的基本架构,如抢单逻辑、方向决策、状态机管理等核心算法(采用SCL语言编写),还分享了许多实际调试过程中遇到的问题及解决方案,例如未初始化变量导致的异常行为、状态机遗漏空闲状态、WinCC画面动态显示的挑战以及通信配置中的ASCII码解析错误等问题。此外,作者还特别提到一些创意性的设计,如电梯同时到达同一层时楼层显示器变为闪烁爱心的效果,以及节能模式下电梯自动停靠中间楼层的功能。 适合人群:对PLC编程、工业自动化控制、电梯调度算法感兴趣的工程技术人员,尤其是准备参加类似竞赛的学生和技术爱好者。 使用场景及目标:适用于希望深入了解PLC编程实践、掌握电梯群控系统的设计思路和技术要点的人士。通过学习本文可以更好地理解如何利用PLC进行复杂的机电一体化项目的开发,提高解决实际问题的能力。 其他说明:文章风格幽默诙谐,将严肃的技术话题融入轻松的生活化比喻之中,使得原本枯燥的专业知识变得生动有趣。同时,文中提供的经验教训对于从事相关领域的工作者来说非常宝贵,能够帮助他们少走弯路并激发更多创新思维。
数据库第一章选择题练习(1).docx
# 【spring-ai-pdf-document-reader-1.0.0-M7.jar中文文档.zip】 中包含: 中文文档:【spring-ai-pdf-document-reader-1.0.0-M7-javadoc-API文档-中文(简体)版.zip】 jar包下载地址:【spring-ai-pdf-document-reader-1.0.0-M7.jar下载地址(官方地址+国内镜像地址).txt】 Maven依赖:【spring-ai-pdf-document-reader-1.0.0-M7.jar Maven依赖信息(可用于项目pom.xml).txt】 Gradle依赖:【spring-ai-pdf-document-reader-1.0.0-M7.jar Gradle依赖信息(可用于项目build.gradle).txt】 源代码下载地址:【spring-ai-pdf-document-reader-1.0.0-M7-sources.jar下载地址(官方地址+国内镜像地址).txt】 # 本文件关键字: spring-ai-pdf-document-reader-1.0.0-M7.jar中文文档.zip,java,spring-ai-pdf-document-reader-1.0.0-M7.jar,org.springframework.ai,spring-ai-pdf-document-reader,1.0.0-M7,org.springframework.ai.reader.pdf,jar包,Maven,第三方jar包,组件,开源组件,第三方组件,Gradle,springframework,spring,ai,pdf,document,reader,中文API文档,手册,开发手册,使用手册,参考手册 # 使用方法: 解压 【spri
适用于理工专业的毕业生,毕业答辩时可供参考,叙述详细准确,可以作为自己答辩PPT的参考
weixin248食堂订餐小程序ssm(文档+源码)_kaic
# 压缩文件中包含: 中文文档 jar包下载地址 Maven依赖 Gradle依赖 源代码下载地址 # 本文件关键字: jar中文文档.zip,java,jar包,Maven,第三方jar包,组件,开源组件,第三方组件,Gradle,中文API文档,手册,开发手册,使用手册,参考手册 # 使用方法: 解压最外层zip,再解压其中的zip包,双击 【index.html】 文件,即可用浏览器打开、进行查看。 # 特殊说明: ·本文档为人性化翻译,精心制作,请放心使用。 ·只翻译了该翻译的内容,如:注释、说明、描述、用法讲解 等; ·不该翻译的内容保持原样,如:类名、方法名、包名、类型、关键字、代码 等。 # 温馨提示: (1)为了防止解压后路径太长导致浏览器无法打开,推荐在解压时选择“解压到当前文件夹”(放心,自带文件夹,文件不会散落一地); (2)有时,一套Java组件会有多个jar,所以在下载前,请仔细阅读本篇描述,以确保这就是你需要的文件;
1.版本:matlab2014/2019a/2024a 2.附赠案例数据可直接运行matlab程序。 3.代码特点:参数化编程、参数可方便更改、代码编程思路清晰、注释明细。 4.适用对象:计算机,电子信息工程、数学等专业的大学生课程设计、期末大作业和毕业设计。
1.版本:matlab2014/2019a/2024a 2.附赠案例数据可直接运行matlab程序。 3.代码特点:参数化编程、参数可方便更改、代码编程思路清晰、注释明细。 4.适用对象:计算机,电子信息工程、数学等专业的大学生课程设计、期末大作业和毕业设计。
内容概要:本文详细介绍了如何利用主从博弈(Stackelberg Game)模型进行电热综合能源系统的动态定价与能量管理。首先解释了主从博弈的基本概念及其在电热综合能源系统中的应用背景,即供电公司作为领导者通过制定电价策略影响用户行为,用户作为追随者根据电价调整用电模式。接着,通过MATLAB编写仿真程序,具体展示了供电公司定价策略、用户响应模型以及主从博弈迭代过程。仿真结果显示,电价与用电需求之间存在动态平衡关系,供电公司可通过调整电价引导用户合理用电,实现系统整体最优运行。此外,文中还讨论了热力系统建模、成本计算方法、博弈迭代收敛条件等关键技术细节,并对未来的研究方向进行了展望。 适合人群:从事能源管理系统设计、优化及相关领域的研究人员和技术人员,特别是对博弈论在能源系统中的应用感兴趣的学者。 使用场景及目标:适用于希望深入了解电热综合能源系统中动态定价与能量管理机制的人群。主要目标是通过理论分析和MATLAB仿真,帮助读者掌握主从博弈模型的应用方法,为实际工程设计提供参考。 其他说明:文中提供了详细的MATLAB代码示例,便于读者理解和复现实验结果。同时强调了在实际应用中需要考虑更多不确定性和个性化需求等问题。
Android逆向过程学习
2级C全国计算机考试上机题库汇总.doc
房地产 -龙湖物业品质提升小方法.doc
内容概要:本文详细介绍了基于S7-200 PLC和MCGS组态软件构建的煤矿排水系统控制方案。主要内容涵盖IO分配、梯形图程序设计、接线图原理、MCGS组态画面配置等方面。通过对水位传感器、故障传感器等输入设备和排水泵、报警装置等输出设备的精确控制,确保了排水系统的高效、可靠运行。文中还分享了一些实际项目中的调试经验和故障排除技巧,如硬件配置优化、信号干扰处理、水位监测精度提升等。 适合人群:从事工业自动化领域的工程师和技术人员,特别是对PLC编程和组态软件有一定了解的人群。 使用场景及目标:适用于煤矿及其他矿业企业的排水系统自动化改造项目,旨在提高排水系统的安全性、稳定性和智能化水平,减少人工干预,预防潜在风险。 其他说明:文章不仅提供了理论指导,还包括大量实战经验分享,有助于读者更好地理解和掌握相关技术和应用场景。
【蓝桥杯EDA】客观题解析
1.版本:matlab2014/2019a/2024a 2.附赠案例数据可直接运行matlab程序。 3.代码特点:参数化编程、参数可方便更改、代码编程思路清晰、注释明细。 4.适用对象:计算机,电子信息工程、数学等专业的大学生课程设计、期末大作业和毕业设计。
内容概要:“华中杯”是由华中地区高校或相关机构举办的数学建模竞赛,旨在培养学生的创新能力和团队合作精神。比赛主要面向全国高校在校生(以本科生为主,部分赛事允许研究生参加),采用团队赛形式(3人一组),参赛队伍需在72小时内完成建模、编程及论文写作。竞赛一般在每年4月或5月举行,设有多个奖项,具体比例根据参赛队伍数量确定。; 适合人群:对数学建模感兴趣并希望提升自身能力的全国高校在校生(本科生为主,部分赛事允许研究生参加)。; 使用场景及目标:①帮助学生了解数学建模竞赛的形式与流程;②为参赛者提供备赛建议,如学习往届真题、掌握Matlab、Python、LaTeX等工具以及明确团队分工;③鼓励学生关注官方通知,确保获取最新赛程和规则信息。; 其他说明:2025年的具体赛程、规则可能会有所调整,请以“华中杯数学建模竞赛官网”或主办方通知为准。可通过学校数学系或相关社团获取报名信息。
光强温湿度计stm32.7z
内容概要:本文详细介绍了基于TMS320F2812 DSP控制器的永磁同步电机(PMSM)矢量控制系统的设计与实现。主要内容涵盖电流环和转速环的双闭环控制,包括ADC采样配置、坐标变换(Clarke和Park变换)、PI调节器的实现以及SVPWM生成等关键技术环节。文中特别强调了各个部分的具体代码实现及其调试技巧,如电流采样的三相两线法、PI调节器的积分限幅处理、SVPWM的扇区判断与作用时间计算等。此外,还讨论了一些常见的调试陷阱和解决方案,如QEP解码配置错误、死区时间设置不当等问题。 适合人群:具有一定嵌入式系统和电机控制基础知识的研发人员和技术爱好者。 使用场景及目标:适用于需要深入了解和掌握基于DSP2812的PMSM矢量控制系统的开发者。主要目标是帮助读者理解并实现高效的电流环和转速环双闭环控制系统,确保电机稳定运行并达到预期性能指标。 其他说明:文章不仅提供了详细的代码片段,还分享了许多实用的经验和教训,有助于读者在实际项目中少走弯路。同时,对于一些复杂的技术细节进行了深入浅出的解释,使得初学者也能逐步理解和应用这些高级控制算法。