http://chaoticjava.com/posts/parallel-and-concurrent-garbage-collectors/
In the last post, I've been talking about the basics of garbage collection and the generational garbage collector. While the generational garbage collector brought huge performance benefits by getting the large, old generation memory area to be infrequently visited by the collector, it still wasn't enough for the new era of faster processors and memory sizes which spawned larger applications with multiple threads running concurrently, creating loads of garbage. The original generational collector was single threaded, and was called the serial collector. A parallel collector was required, and since the young and old generation memory spaces are different in the way the objects residing in them are used, so are the collectors implemented for them.
Since explaining multi-threaded implementations is difficult, and more-so when it's with the topic of garbage collection, I've used illustrations that depict the state of a one or two threads at a time; in reality, the amount of threads running is equal to the number of processors the machine has, which can easily be a two-digit number. I hope the explanations are clear enough, and as always questions are welcomed!
Parallel scavenge
The parallel implementation for the young generation memory space is simple and is as follows: the root objects are divided among the garbage collection threads, making the marking phase much shorter. As soon as an object is found to be live, it is copied to the To area immediately (or to the old generation space, if he survived enough iterations). This can be accomplished by allocating a segment in the To area per thread, allowing the different threads to copy objects to the To area without the need for thread synchronization. Since allocation is done in Eden anyway, the To area is allowed to be a bit fragmented without causing harm.
The following image illustrates how a single thread claims a single object. Obviously, in reality all threads allocated for the collection task are sweeping in a non-synchronized way, unaware of each other's work.
Parallel compacting collector
The implementation for the old generation memory space is much more complex. Remember that this memory is significantly larger and is occupied by objects that are less likely to be garbage. This operation is a three steps procedure: first, the memory space is divided into regions. Then, the parallel marking starts, with the root objects being divided among collector threads. When a live object is encountered, the region it is located in is updated so that at the end of the process, each region contains information stating how much live data is available in it, and where live objects are located in it.
After that‘s done, the summary step comes along where each region is examined for its density – the amount of live bytes compared to the total bytes in the region. An assumption is now made: since we know that previous collections had been made which compacts the bytes “to the left”, and we know that new objects moved into the old generation are allocated on its “right” side, we can assume that the left side of the memory is filled with older objects than the right side, and in respect with the generational collection idea, the more “to the left” an object is the less likely for it to be dead. The garbage collection uses this assumption for performance optimization: starting at the left most region it looks for the first region with a density which is worthwhile for collection. All regions to the left of that region are not collected and are called the dense prefix.
In addition, during summary the collector can tell what amount of space is going to remain in each region after its compacting so it can already choose which regions are going to be filling other regions (typically, regions on the “right” are filling regions on the “left”). It can also note for each region which regions will be filling it, and for each filling region where it’s going to be located after sweeping, or where the first live byte is going to be located.
Now the sweeping step itself starts. Regions that can be collected without synchronization with other threads are being divided between collector threads. These are regions that do not fill other regions, so they‘re compacted into themselves, or regions that are already completely empty already and can just be claimed by the collector. After a thread finishes claiming a region, it immediately starts filling that region from the regions destined to fill it – this information was determined previously by the summary step. Since it knows no other thread is reading from that source region or writing to the destination region (as the thread is in charge of both) it can make the filling and the claiming without need of synchronization.
Note that in the following image the collection occurs over only 60% of the heap (making the collection twice as fast as if it would collect everything), and reclaiming 82% of the garbage. That means that in order to collect the additional 18%, the garbage collector would have to work just as hard as it did for the first 82%! Obviously, these are not precise calculations, but surely the point is made: this is the reason the dense prefix is an important optimization step.
Concurrent mark-sweep collector
While both prior collectors mentioned are parallel and take a lot less time to operate, they still stop the application threads while working. Some applications might have a different requirement though: they might require a more deterministic Java, and are willing to sacrifice some performance in order to get a shorter to non-existent waiting times caused by the garbage collector. For those applications the Java VM supplies the Concurrent Mark-Sweep garbage collector.
This collector uses four steps: first, the initial mark step stops the application for a very short period of time while a single collection thread marks the first level of objects directly connected to the root objects. After that, the application continues to work normally while the collector performs a concurrent mark, marking the rest of the objects while the application keeps running. Because the application might change references to the object graph by the time the concurrent marking is finished, the collector performs an additional step: the remarking, which stops the application for another brief period. When the application changes an object it Is marked as changed. During the remarking step the application checks only those changed objects, and to make things faster it distributes the load between multiple threads. Finally, the collector performs a concurrent sweep which does not compact the memory area in order to save time and operates concurrently with the application threads.
Since the memory area is not compacted, the memory allocator is different in two ways: first, it needs to keep multiple pointers to free memory blocks categorized by their size to allow allocation within the fragments of free memory. Second, it keeps count of the requested lengths to determine the popular object sizes within the application; then, it uses this information to split and merge the free block fragments to match future requirements by the application, assuming that the popular sizes will continue to be popular and often requested.
Final words
This covers everything I’ve managed to gather about the way garbage collectors work. Note that for some applications (typically small, client-side ones) the parallel collectors are not the best. The Java VM selects which collectors to use automatically when its loading according to the properties of the machine and operating system it‘s running on. You can read all about it on Sun’s Java VM ergonomics document.
相关推荐
Parallel and Concurrent Programming in Haskell.pdf Parallel and Concurrent Programming in Haskell.pdf
本书《Parallel and Concurrent Programming in Haskell》由Simon Marlow撰写,面向已经具备Haskell基础知识的读者,旨在指导他们通过Haskell语言的多个API和框架,编写支持并行和并发处理的程序。本书详细介绍了...
《Parallel and Concurrent Programming in Haskell 2013》是Simon Marlow所著的一本关于Haskell编程语言在并行与并发编程方面的专著。在这本书中,作者深入探讨了Haskell语言的并行与并发能力,以及如何有效地利用...
Go 1.5 concurrent garbage collector pacingAustin Clements 3/10/2015golang.org/s/go15gcpacing Comment at https://groups.google.com/forum/#!topic/golangdev/YjoG9yJktg4Introduction Prior ...
3. CMS(Concurrent Mark Sweep)Garbage Collector:此垃圾回收器旨在减少应用程序的暂停时间,适合响应时间敏感的应用。CMS使用多线程并发地进行大部分垃圾回收工作,仅在特定阶段暂停应用程序。它在处理大量数据...
parallel and high performance computing: CPU, GPU
"This book marks an important landmark in the theory of distributed systems and I highly recommend it to students and practicing engineers in the fields of operations research and computer science, as...
Parallel and Distributed Programming Using C++ provides an up-close look at how to build software that can take advantage of multiprocessor computers. Simple approaches for programming parallel ...
Parallel and Distributed Programming Using C++.
标题《Parallel and Concurrent Programming in Haskell》和描述表明,该文件是一本专注于Haskell语言在并行和并发编程领域的深入探讨。Haskell,作为一门纯粹的、懒惰的函数式编程语言,为开发者提供了强大的抽象...
本书《并行与分布式编程使用C++》(Parallel and Distributed Programming Using C++)由Cameron Hughes和Tracey Hughes撰写,由Addison-Wesley出版,首次发行于2003年8月25日,ISBN为0-13-101376-9,共有720页。...
Java Stream API Parallel Collectors - 克服标准并行流的限制 Parallel Collectors 是一个工具包,使用 Stream API 简化 Java 中的并行收集处理......但没有标准并行流强加的限制。 list.stream() .collect...
根据提供的文件内容,以下是对《Algorithm Design: Parallel and Sequential》一书的知识点概述: ### 标题分析 该文件标题为 "Algorithm Design Parallel and Sequential 2017.9.pdf",直接指出了书的名称以及出版...
《平行与分布式编程使用C++》是一本深入探讨如何利用C++进行并行和分布式计算的书籍。在当今计算环境中,高效地利用多核处理器和分布式系统资源是至关重要的,这本书正是为此目的而编写。通过学习这本书,读者可以...