HashMap 死循环的探究

xm_king

浏览: 396779 次
性别:
来自: 杭州

最近访客更多访客>>

dy.f

leoeco2000

xiaomabobo

双人可卿

博主相关

博客

微博

相册

留言

关于我

博客专栏

: Spring技术内幕读书笔...
浏览量：15807

文章分类

社区版块

存档分类

博客分类：

并发编程

多线程数据结构算法 J#SUN

本文受http://pt.alibaba-inc.com/wp/dev_related_969/hashmap-result-in-improper-use-cpu-100-of-the-problem-investigated.html 的启发，引用了其中的思想，对此表示感谢。

来到杭州实习有一段日子了，很长时间都没有更新博客了，前几天，闲来无事，随便翻了一本书，毕玄的《分布式JAVA应用》，在看到HashMap那一节的时候，其中提到了HashMap是非线程安全的，在并发场景中如果不保持足够的同步，就有可能在执行HashMap.get时进入死循环，将CPU的消耗到100%。HashMap是线程不安全的，这个我知道的，但是在get操作会出现死循环，我还是第一次听说到。于是我google了一下，网上讨论的很多，原来很多人对这个都感兴趣啊，于是我深入到HashMap的源码去探究了一下。

大家都知道，HashMap采用链表解决Hash冲突，具体的HashMap的分析可以参考一下http://zhangshixi.iteye.com/blog/672697 的分析。因为是链表结构，那么就很容易形成闭合的链路，这样在循环的时候就会产生死循环。但是，我好奇的是，这种闭合的链路是如何形成的呢。在单线程情况下，只有一个线程对HashMap的数据结构进行操作，是不可能产生闭合的回路的。那就只有在多线程并发的情况下才会出现这种情况，那就是在put操作的时候，如果size>initialCapacity*loadFactor，那么这时候HashMap就会进行rehash操作，随之HashMap的结构就会发生翻天覆地的变化。很有可能就是在两个线程在这个时候同时触发了rehash操作，产生了闭合的回路。下面我们从源码中一步一步地分析这种回路是如何产生的。先看一下put操作：

    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        //存在key，则替换掉旧的value
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        //table[i]为空，这时直接生成一个新的entry放在table[i]上
        addEntry(hash, key, value, i);
        return null;
    }

addEntry操作：

    void addEntry(int hash, K key, V value, int bucketIndex) {
	Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
        if (size++ >= threshold)
            resize(2 * table.length);
    }

可以看到，如果现在size已经超过了threshold，那么就要进行resize操作：

    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        //将旧的Entry数组的数据转移到新的Entry数组上
        transfer(newTable);
        table = newTable;
        threshold = (int)(newCapacity * loadFactor);
    }

看一下transfer操作，闭合的回路就是在这里产生的：

void transfer(Entry[] newTable) {
        Entry[] src = table;
        int newCapacity = newTable.length;
        /*
         * 在转换的过程中，HashMap相当于是把原来链表上元素的的顺序颠倒了。
         * 比如说 原来某一个Entry[i]上链表的顺序是e1->e2->null,那么经过操作之后
         * 就变成了e2->e1->null
         */
        for (int j = 0; j < src.length; j++) {
            Entry<K,V> e = src[j];
            if (e != null) {
                src[j] = null;
                do {
                	//我认为此处是出现死循环的罪魁祸首
                    Entry<K,V> next = e.next;
                    int i = indexFor(e.hash, newCapacity);
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
                } while (e != null);
            }
        }
    }

那么回路究竟是如何产生的呢，问题就出在next=e.next这个地方，在多线程并发的环境下，为了便于分析，我们假设就两个线程P1,P2。src[i]的链表顺序是e1->e2->null。我们分别线程P1,P2的执行情况。

首先，P1，和P2进入到了for循环中，这时候在线程p1和p2中，局部变量分别如下：

	e	next
P1	e1	e2
P2	e1	e2

此时两个Entry的顺序是依然是最开始的状态e1->e2->null, 但是此时p1可能某些原因线程暂停了，p2则继续执行，并执行完了do while循环。这时候Entry的顺序就变成了e2->e1->null。在等到P2执行完之后，可能p1才继续执行，这时候在P1线程中局部变量e的值为e1，next的值为e2(注意此时两个元素在内存中的顺序变成了e2->e1->null)，下面P1线程进入了do while循环。这时候P1线程在新的Entry数组中找到e1的位置，

e.next = newTable[i];
newTable[i] = e;

下面会把next赋值给e，这时候e的值成为了e2，继续下一次循环，这时候

	e	next
P1	e2	e1

e2->next=e1，这个是线程P2的"功劳"。程序执行完这次循环之后，e=e1,

继续第三次循环，这时候根据算法，就会进行e1->next=e2。

这样在线程P1中执行了 e1->next=e2,在线程P2中执行了 e2->next=e1，这样就形成了一个环。在get操作的时候，next值永远不为null，造成了死循环。

实际上，刚开始我碰到这个说法的时候，还被吓了一跳，HashMap怎么还会出现这个问题呢，仔细分析一下，这个问题再高并发的场景下是很容易出现的。Sun的工程师建议在这样的场景下应采用ConcurrentHashMap。具体参考http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6423457 。

虽然这个问题再平时的工作中还没有遇到，但是以后需要注意。要在不同的场景下选择合适的类，规避类似HashMap这种死循环的问题。

分享到：

Linux环境下源码安装MySQL的详细步骤 | Java并发编程实践之线程的基本控制

2011-03-14 21:44
浏览 7066
评论(29)
论坛回复 / 浏览 (29 / 20618)
分类:编程语言
查看更多

9 楼 xm_king 2011-03-16

sdh5724 写道

写多了，就成月经贴了。

刚去查了一下，才知道什么叫做月经贴，out了。
师兄在哪个部门？有时间交流一下！

8 楼 sdh5724 2011-03-16

写多了，就成月经贴了。

7 楼 xm_king 2011-03-15

chenyongxin 写道

void transfer(Entry[] newTable) {
        Entry[] src = table;
        int newCapacity = newTable.length;
        /*
         * 在转换的过程中，HashMap相当于是把原来链表上元素的的顺序颠倒了。
         * 比如说原来某一个Entry[i]上链表的顺序是e1->e2->null,那么经过操作之后
         * 就变成了e2->e1->null
         */
        for (int j = 0; j < src.length; j++) {
           Entry<K,V> e = src[j];
            if (e != null) {
                src[j] = null;
                do {
                    //我认为此处是出现死循环的罪魁祸首
                    Entry<K,V> next = e.next;
                    int i = indexFor(e.hash, newCapacity);
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
                } while (e != null);
            }
        }
    }

   你确定是倒序？
    有点不解：在下慢慢道来

      HashMap的put方法:
      public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        [color=red]int hash = hash(key.hashCode());[/color]
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        [color=red]addEntry(hash, key, value, i);[/color]        return null;
    }

 
     void addEntry(int hash, K key, V value, int bucketIndex) {
	Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new [color=red]Entry<K,V>(hash, key, value, e);[/color]
        if (size++ >= threshold)
            resize(2 * table.length);
    }

Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            [color=red]hash = h;[/color]        }

       static int indexFor(int h, int length) {
        return h & (length-1);
    }

	static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

以下是我测试indexFor的代码和结果，不知道问题出在哪：

       
 Entry[] newTable = new Entry[3];
 int newCapacity = newTable.length;
        System.out.println(hash("p1".hashCode())+"-->"+indexFor(hash("p1".hashCode()), newCapacity));
        System.out.println(hash("p2".hashCode())+"-->"+indexFor(hash("p2".hashCode()), newCapacity));
        System.out.println(hash("p3".hashCode())+"-->"+indexFor(hash("p3".hashCode()), newCapacity));

     结果：3334-->2
          3333-->0
          3332-->0

实际上，因为HashMap可以存放(null,value)即key可以是null，在resize的时候，所以需要遍历一边，否则，你怎么判断这个table[i]上到底有没有元素呢？

6 楼 xm_king 2011-03-15

chenyongxin 写道

      HashMap的put方法:
      public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        [color=red]int hash = hash(key.hashCode());[/color]
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        [color=red]addEntry(hash, key, value, i);[/color]        return null;
    }

 
     void addEntry(int hash, K key, V value, int bucketIndex) {
	Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new [color=red]Entry<K,V>(hash, key, value, e);[/color]
        if (size++ >= threshold)
            resize(2 * table.length);
    }

Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            [color=red]hash = h;[/color]        }

       static int indexFor(int h, int length) {
        return h & (length-1);
    }

	static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

以下是我测试indexFor的代码和结果，不知道问题出在哪：

       
 Entry[] newTable = new Entry[3];
 int newCapacity = newTable.length;
        System.out.println(hash("p1".hashCode())+"-->"+indexFor(hash("p1".hashCode()), newCapacity));
        System.out.println(hash("p2".hashCode())+"-->"+indexFor(hash("p2".hashCode()), newCapacity));
        System.out.println(hash("p3".hashCode())+"-->"+indexFor(hash("p3".hashCode()), newCapacity));

     结果：3334-->2
          3333-->0
          3332-->0

在resize操作之前，table数组上的某一个元素Entry=tabel[i]上，可能的顺序是e1->e2->null，在resize之后，新的table数组上某个元素Entry=table[j](正常情况下，i！=j，因为新的table和旧的table的大小都不一样了，是吧)。但是e1,e2还是会在同一个链上，因为它们的hash值是相同的。这时候，resize操作针对元素特定的hash值会重新设置next。resize的算法就是先从旧的链表上取第一个，然后放在新的链表上的头部，新链表之前的元素成为了这个元素的next。这样，旧table[i]的顺序是e1->e2->null,那么新的table[j]的顺序不就是e2->e1->null。
这么做是在只需要遍历一边就可以完成整个链表的转换。你可以试着源码静下心分析一下。

5 楼 brown802 2011-03-15

快来看看！

4 楼 chenyongxin 2011-03-15

      HashMap的put方法:
      public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        [color=red]int hash = hash(key.hashCode());[/color]
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        [color=red]addEntry(hash, key, value, i);[/color]        return null;
    }

 
     void addEntry(int hash, K key, V value, int bucketIndex) {
	Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new [color=red]Entry<K,V>(hash, key, value, e);[/color]
        if (size++ >= threshold)
            resize(2 * table.length);
    }

Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            [color=red]hash = h;[/color]        }

       static int indexFor(int h, int length) {
        return h & (length-1);
    }

	static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

以下是我测试indexFor的代码和结果，不知道问题出在哪：

       
 Entry[] newTable = new Entry[3];
 int newCapacity = newTable.length;
        System.out.println(hash("p1".hashCode())+"-->"+indexFor(hash("p1".hashCode()), newCapacity));
        System.out.println(hash("p2".hashCode())+"-->"+indexFor(hash("p2".hashCode()), newCapacity));
        System.out.println(hash("p3".hashCode())+"-->"+indexFor(hash("p3".hashCode()), newCapacity));

     结果：3334-->2
          3333-->0
          3332-->0

3 楼 cwfmaker 2011-03-15

以前只知道会出现死循环，但是具体原因不太清楚，呵呵，加精！

2 楼 Enjoy_show 2011-03-15

wj_126mail 写道

好文章，多线程编程时还是要多考虑

确实好文章

1 楼 wj_126mail 2011-03-15

好文章，多线程编程时还是要多考虑

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

博客专栏

文章分类

社区版块

存档分类

最新评论