MaxineVM GC代码走读笔记 -

ldq67123

浏览: 849 次
性别:
来自: 广州

最近访客更多访客>>

ssbpls

hujian_

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2017-04 ( 1)
更多存档...

MaxineVM GC代码走读笔记

博客分类：

java jvm MaxineVM GC

1. MaxineVM简介... 2

2. GC：经典复制算法... 2

3. MaxineVM对象内存布局... 4

4. MaxineVM线程栈内存布局... 4

5. MaxineVM根扫描... 6

6. 对经典复制算法的改进... 9

7. MaxineVM更新对象间引用... 11

8. 总结... 12

1. MaxineVM简介

其官网的简介是：A Next Generation, Highly Productive Platform for Virtual Machine Research。

MaxineVM，用Java来跑Java的研究型JVM，且性能并不差。

其源码由大部分Java+小部分C构成。

MaxineVM与Sun JDK 6是兼容的，无需再实现一次JDK。

提供了可视化查看工具，观察MaxineVM的堆管理、对象布局、锁同步、JIT等。

MaxineVM的执行过程

2. GC：经典复制算法

如上图所示，复制算法的基本要点是

l 把内存分为两块，一块称为FROM，一块称为TO

l 有新对象请求分配内存时，总是从FROM中依次分配对象，用一个FREE指针来标识空闲内存

l 当内存不足无法分配时，启动GC

l 从根（root，如线程栈、类静态变量、JNI全局引用等）处开始查找可达对象（总是在FROM中），先标记为已复制，再复制到TO；递归遍历该对象字段，把它们也当作可达对象。对象的引用关系相当于单向有环图，遍历算法可用BFS、DFS。

l 当遍历复制完毕后，还需要更新对象的引用关系，借助一个额外字段forwarding指针来做复制关系记录

GC算法伪代码如下：

def copy_gc()

$free = $to

for ($obj in $root)

$newRef = move($obj)

set_ptr($obj, $newRef)

$scan = $to

while ($scan < $free)

$obj = cast_to_obj($scan)

for ($child in fields($obj))

if ($child < $from+size($from) && $child >= $from)

set_ptr($child, $child.forwarding)

$scan += sizeof($obj)

swap_ptr($to, $from)

def move($obj)

if ($obj.marked)

return $obj.forwarding

$newObj = $free

$free += sizeof($obj)

copy($obj, $newObj, sizeof($obj))

$obj.forwarding = $newObj

$obj.marked = true

for ($child in fields($obj))

move($child)

return $newObj

3. MaxineVM对象内存布局

在走读MaxineVM代码前，首先看下对象内存布局（MaxineVM有多种布局方式，此处仅讨论一种）

每一个对象都有一个1Word的Hub指针域，指向一个结构体，描述这个对象的内存布局，同时也用作加速虚方法调用分派（dispatch）

4. MaxineVM线程栈内存布局

根据上面GC算法伪代码，我们可以对比下真正的GC算法实现。因为根遍历从栈上开始，我们首先要了解下栈内存布局。

如图所示，MaxineVM的线程栈是和OS的线程栈是同一个，入栈时栈指针（stack pointer）从高地址（栈底）向低地址（栈顶）增长。栈底一般是libpthread的方法帧，这部分内存没有指向Java堆的指针，无需GC。接着就进入了Java的方法帧，根据JVM规范

Each frame has its own array of local variables (§2.6.1), its own operand stack (§2.6.2), and a reference to the run-time constant pool (§2.5.5) of the class of the current method

那么线程栈中在GC中需要当作根的是每一个方法帧中的本地变量（local variables）、操作数栈（operand stack）。本地变量在方法帧是一片因方法而异的定长内存，即每个Java方法编译后就已经知道需要多少的栈内存来存临时变量了；同样的操作数栈也是在编译器就能知道其最大内存占用。所以对同一个方法来说，Java方法帧的栈内存占用是定长的连续内存。执行过程中栈帧中既有基本数据类型（int|long|char|short|byte|boolean）又有引用数据类型，仅引用数据类型才是需要被垃圾回收的，在MaxineVM中是依赖一个位图来标记栈中的引用指针，即上图中的reference map area。关于线程栈的其余部分，与理解GC算法实现关联不大，此处不再展开。

如此可知，MaxineVM是通过扫描reference map中来找出可达对象的：

com.sun.max.vm.heap.sequential.SequentialHeapRootsScanner

publicvoid run() {

VmThreadMap.ACTIVE.forAllVmThreadLocals(null, _vmThreadLocalsScanner);

VMConfiguration.hostOrTarget().monitorScheme().scanReferences(_pointerIndexVisitor);

}

privatefinal Pointer.Procedure _vmThreadLocalsScanner = new Pointer.Procedure() {

publicvoidrun(Pointer vmThreadLocals) {

VmThreadLocal.scanReferences(vmThreadLocals, _pointerIndexVisitor);

}

};

逻辑很简单（尚未到根据reference map遍历栈这步），仅仅是使用观察者模式分别遍历所有的活动线程和monitor（即Java中的synchronized关键字锁住的对象）。monitor对象也需要当作根是因为一旦执行完synchronized(obj)后（即字节码monitorenter），线程栈上就没有这个对象的指针了，如下代码中的注释部分，栈上就没有monitor对象的指针了：

synchronized (lockObj()) {

// do something

// GC

// do something

}

5. MaxineVM根扫描

接下来着重看对线程栈的GC root扫描过程，即上文代码中的com.sun.max.vm.thread.VmThreadLocal.scanReferences

publicstaticvoidscanReferences [m1] (Pointer vmThreadLocals, PointerIndexVisitor wordPointerIndexVisitor) {

final Pointer lastJavaCallerStackPointer = LAST_JAVA_CALLER_STACK_POINTER [m2] .getVariableWord(vmThreadLocals).asPointer();

final Pointer lowestActiveSlot = LOWEST_ACTIVE_STACK_SLOT_ADDRESS.getVariableWord(vmThreadLocals).asPointer();

final Pointer highestSlot = HIGHEST_STACK_SLOT_ADDRESS.getConstantWord(vmThreadLocals).asPointer();

final Pointer lowestSlot = LOWEST_STACK_SLOT_ADDRESS.getConstantWord(vmThreadLocals).asPointer();

final VmThread vmThread = UnsafeLoophole.cast(VM_THREAD.getConstantReference(vmThreadLocals));

StackReferenceMapPreparer.scanReferenceMapRange(vmThreadLocals, lowestSlot, vmThreadLocalsEnd(vmThreadLocals), wordPointerIndexVisitor);

StackReferenceMapPreparer.scanReferenceMapRange(vmThreadLocals, lowestActiveSlot, highestSlot, wordPointerIndexVisitor); [m3]

}

publicstaticvoidscanReferenceMapRange [m4] (Pointer vmThreadLocals, Pointer lowestSlot, Pointer highestSlot, PointerIndexVisitor wordPointerIndexVisitor) {

final Pointer lowestStackSlot = VmThreadLocal.LOWEST_STACK_SLOT_ADDRESS.getConstantWord(vmThreadLocals).asPointer();

final Pointer referenceMap = VmThreadLocal.STACK_REFERENCE_MAP [m5] .getConstantWord(vmThreadLocals).asPointer();

finalinthighestRefMapByteIndex = referenceMapByteIndex(lowestStackSlot, highestSlot);

finalintlowestRefMapByteIndex = referenceMapByteIndex(lowestStackSlot, lowestSlot);

// Handle the lowest reference map byte separately as it may contain bits

// for slot addresses lower than 'lowestSlot'. These bits must be ignored:

finalintlowestBitIndex = referenceMapBitIndex(lowestStackSlot, lowestSlot);

finalinthighestBitIndex = referenceMapBitIndex(lowestStackSlot, highestSlot);

if (highestRefMapByteIndex == lowestRefMapByteIndex [m6] ) {

scanReferenceMapByte(lowestRefMapByteIndex, lowestStackSlot, referenceMap, lowestBitIndex % Bytes.WIDTH, highestBitIndex % Bytes.WIDTH, vmThreadLocals, wordPointerIndexVisitor);

} else {

scanReferenceMapByte(lowestRefMapByteIndex, lowestStackSlot, referenceMap, lowestBitIndex % Bytes.WIDTH, Bytes.WIDTH, vmThreadLocals, wordPointerIndexVisitor);

scanReferenceMapByte(highestRefMapByteIndex, lowestStackSlot, referenceMap, 0, (highestBitIndex % Bytes.WIDTH) + 1, vmThreadLocals, wordPointerIndexVisitor);

for (intrefMapByteIndex = lowestRefMapByteIndex + 1; refMapByteIndex<highestRefMapByteIndex; refMapByteIndex++) {

scanReferenceMapByte(refMapByteIndex, lowestStackSlot, referenceMap, 0, Bytes.WIDTH, vmThreadLocals, wordPointerIndexVisitor);

}

privatestaticvoidscanReferenceMapByte [m7] (intrefMapByteIndex, Pointer lowestStackSlot, Pointer referenceMap, intstartBit, intendBit, Pointer vmThreadLocals, PointerIndexVisitor wordPointerIndexVisitor) {

finalintrefMapByte = referenceMap.getByte(refMapByteIndex);

if (refMapByte != 0) {

finalintbaseIndex = refMapByteIndex * Bytes.WIDTH;

final Pointer slot = lowestStackSlot.plus(baseIndex * Word.size());

for (intbitIndex = startBit; bitIndex<endBit; bitIndex++) {

if (((refMapByte>>>bitIndex) & 1) != 0) {

wordPointerIndexVisitor.visitPointerIndex(slot, bitIndex);

}

发现了栈中的一个引用类型指针后，就需要进行真正的数据拷贝了，即观察者回调wordPointerIndexVisitor或_pointerIndexVisitor，也相当于伪代码中的move方法

publicvoid visitPointerIndex(Pointer pointer, intwordIndex) {

final Grip oldGrip = pointer.getGrip(wordIndex);

final Grip newGrip = mapGrip(oldGrip);

if (newGrip != oldGrip) {

pointer.setGrip(wordIndex, newGrip) [m8] ;

}

private Grip mapGrip(Grip grip) {

final Pointer fromOrigin = grip.toOrigin();

if (_fromSpace.contains(fromOrigin) [m9] ) {

final Grip forwardGrip = Layout.readForwardGrip [m10] (fromOrigin);

if (!forwardGrip.isZero() [m11] ) {

returnforwardGrip;

}

final Pointer fromCell = Layout.originToCell(fromOrigin);

final Size size = Layout.size(fromOrigin);

final Pointer toCell = gcAllocate(size) [m12] ;

Memory.copyBytes(fromCell, toCell, size);

final Pointer toOrigin = Layout.cellToOrigin(toCell);

final Grip toGrip = Grip.fromOrigin(toOrigin);

Layout.writeForwardGrip [m13] (fromOrigin, toGrip);

returntoGrip;

}

returngrip;

}

6. 对经典复制算法的改进

注意到mapGrip方法仅仅复制了指针所指的对象，并没有递归复制引用到的所有子对象。这其实是对基于递归复制的经典复制算法的改进，递归对GC线程的栈空间来说是不可控的（如在对象引用变成链表式的极端情况）。但也不能简单地使用额外堆空间构造一个迭代栈来去递归，因为迭代复制的过程对堆空间的消耗也是负担。MaxineVM的策略是把复制GC算法中的TO空间当成了迭代容器，实际上MaxineVM的复制GC算法伪代码如下

def MaxineVM_copy_gc()

$free = $to

for ($obj in $root)

$newRef = move($obj)

set_ptr($obj, $newRef)

$scan = $to

while ($scan < $free)

$obj = cast_to_obj($scan)

for ($child in fields($obj))

if ($child < $from+size($from) && $child >= $from)

$movedRef = move($child)

set_ptr($child, $movedRef)

$scan += sizeof($obj)

swap_ptr($to, $from)

def move($obj)

if ($obj.marked)

return $obj.forwarding

$newObj = $free

$free += sizeof($obj)

copy($obj, $newObj, sizeof($obj))

$obj.forwarding = $newObj

$obj.marked = true

for ($child in fields($obj))

move($child)

return $newObj

在move方法中，不再递归调用自身。当复制完根对象后，从TO空间开始迭代，迭代过程中，一边复制对象（已经复制的对象直接返回新地址），一边更新引用关系。其迭代过程可看简图

虽然相比经典算法已经有了较大的性能提升，但MaxineVM中的实现仍是非常原始的，如改进后的算法是基于BFS复制对象的，对内存访问友好性不如DFS；再如堆管理非常粗糙，总是有50%的内存空间是不能分配内存的。

7. MaxineVM更新对象间引用

与伪代码相对应的，MaxineVM经过根扫描后，更新对象间引用的代码com.sun.max.vm.heap.sequential.semiSpace.SemiSpaceHeapScheme::moveReachableObjects

privatevoidmoveReachableObjects() {

Pointer cell = _toSpace.start().asPointer();

while (cell.lessThan(_allocationMark)) {

cell = DebugHeap.checkDebugCellTag [m14] (cell);

cell = visitCell(cell);

}

public Pointer visitCell(Pointer cell) {

final Pointer origin = Layout.cellToOrigin(cell [m15] );

final Grip oldHubGrip = Layout.readHubGrip(origin);

final Grip newHubGrip = mapGrip(oldHubGrip) [m16] ;

if (newHubGrip != oldHubGrip) {

Layout.writeHubGrip [m17] (origin, newHubGrip);

}

final Hub hub = UnsafeLoophole.cast(newHubGrip.toJava()) [m18] ;

final SpecificLayout specificLayout = hub.specificLayout() [m19] ;

if (specificLayout.isTupleLayout [m20] ()) {

TupleReferenceMap.visitOriginOffsets(hub, origin, _pointerOffsetGripUpdater);

if (hub.isSpecialReference()) {

SpecialReferenceManager.discoverSpecialReference [m21] (Grip.fromOrigin(origin));

}

returncell.plus(hub.tupleSize());

}

if (specificLayout.isHybridLayout [m22] ()) {

TupleReferenceMap.visitOriginOffsets(hub, origin, _pointerOffsetGripUpdater);

} elseif (specificLayout.isReferenceArrayLayout()) {

scanReferenceArray(origin);

}

returncell.plus(Layout.size(origin));

}

privatefinal PointerOffsetVisitor _pointerOffsetGripUpdater = newPointerOffsetVisitor() {

publicvoidvisitPointerOffset(Pointer pointer, intoffset) {

final Grip oldGrip = pointer.readGrip(offset);

final Grip newGrip = mapGrip(oldGrip);

if (newGrip != oldGrip) {

pointer.writeGrip(offset, newGrip);

}

};

8. 总结

本文只是选读了MaxineVM的GC部分，实用的GC算法实际上也没有那么简单，MaxineVM对这个代码实现的注释是

A simple semi-space scavenger heap, mainly for testing. No, we do NOT share code with other implementations here, even if this means duplication of effort. This code base is supposed to remain stable, as a reliable fallback position. Refactoring of whatever other fancy memory management library must not damage the functionality here.

即这只是个最基础的实现。更多GC算法的知识，推荐阅读

l 《垃圾回收算法手册:自动内存管理的艺术》，理查德·琼斯 (Richard Jones)

l 《垃圾回收的算法与实现》，中村成洋相川光

[m1] 遍历整个线程栈

[m2] LAST_JAVA_CALLER_STACK_POINTER、LOWEST_ACTIVE_STACK_SLOT_ADDRESS、HIGHEST_STACK_SLOT_ADDRESS、LOWEST_STACK_SLOT_ADDRESS是基于线程栈基址的四个偏移量，对应的位置可查看上文栈内存布局

[m3] 对每一个线程栈，都分成两段遍历。因为每个线程栈中间都有两个不可读写的内存页（即YellowZone和RedZone）

[m4] 遍历线程栈的其中一段内存

描述 [m5] 栈内引用类指针分布的位图reference map

这是一个闭区间，即[ [m6] lowest, highest ]

[m7] 遍历位图中的一个字节

[m8] 把栈上的指针指向移动后的对象

[m9] 指针值范围在FROM段,，说明对象可能需要复制移动（当已经移动过了就不需要了）

[m10] 读出forwarding指针值，相当于Grip* forwarding = fromOrigin->forwarding

[m11] 指针值不为NULL，说明对象已经被移动过了

[m12] 在TO段分配size大小的空间

[m13] 向from对象的forwarding指针域写入to对象的grip地址

[m14] 在debug模式下，对象前后会被加上padding。此处是把可能的padding去掉