源码分析之 ConcurrentHashMap

王新春

浏览: 345788 次
性别:
来自: 北京

最近访客更多访客>>

小白到此一游

limingmax

xieaiguo

yunzhu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

java-collection

java ConcurrentHashMap 多线程map 数据结构

关于hash
https://www.oschina.net/translate/how-to-implement-javas-hashcode-correctly

jdk提供的线程安全的类似HashMap实现的数据结构：ConcurrentHashMap

功能特点：
1、具备HashMap的一般规范，和HashMap的基本实现原理一致。（底层数据结构：数组+链表）
2、和HashTable（底层基于：当前对象锁）相比，有更高的并发效果。（ConcurrentHashMap底层基于：分段锁）

内部数据结构：ConcurrentHashMap 类中包含两个静态内部类 HashEntry 和 Segment。HashEntry 用来封装映射表的键 / 值对；Segment 用来充当锁的角色，每个 Segment 对象守护整个散列映射表的若干个桶。每个桶是由若干个 HashEntry 对象链接起来的链表。一个 ConcurrentHashMap 实例中包含由若干个 Segment 对象组成的数组。

ConcurrentHashMap 效果总结：
1、并发性更好，分段锁机制。
2、size contains 等全局检索的的弱一致性问题。

线程安全的底层的实现依赖：
1、写操作，加锁。防止线程的写操作覆盖。
2、remove(e) 操作对e之前的数据进行复制操作。
3、volatile 控制下线程间的可见性。
4、final 修饰的对象的不变性。（也包括初始化的线程安全）
即用 HashEntery 对象的不变性来降低执行读操作的线程在遍历链表期间对加锁的需求。

以上是我的理解，有理解错误点，请指出！

HashEntry：map的每一项，此类是不可变的，当然也是线程安全的。

static final class HashEntry<K,V> {
        final K key; //声明 key 为 final 型
        final int hash; // 声明 hash 值为 final 型 
        volatile V value; // 声明 value 为 volatile 型，保证值在多个线程的可见性
        final HashEntry<K,V> next; // 声明 next 为 final 型 ，保证当前对象是不可变对象


        HashEntry(K key, int hash, HashEntry<K,V> next, V value) {
            this.key = key;
            this.hash = hash;
            this.next = next;
            this.value = value;
        }

	@SuppressWarnings("unchecked")
	static final <K,V> HashEntry<K,V>[] newArray(int i) {
	    return new HashEntry[i];
      }
 }

Segment：类继承于 ReentrantLock 类，从而使得 Segment 对象能充当锁的角色。
每个 Segment 对象用来守护其（成员对象 table 中）包含的若干个桶。
它是由若干个HashEntry组成的链表构成。

 transient volatile int count;  //当前段的元素数量，它必须是volatile修饰，保证内存可见性
 transient int threshold; //当 table 中包含的 HashEntry 元素的个数超过本变量值时，触发 table 的再散列
/** table 被更新的次数*/ 
transient int modCount; 
/** 装载因子*/ 
final float loadFactor; 

/*table 是由 HashEntry 对象组成的数组
如果散列时发生碰撞，碰撞的 HashEntry 对象就以链表的形式链接成一个链表
table 数组的数组成员代表散列映射表的一个桶
每个 table 守护整个 ConcurrentHashMap 包含桶总数的一部分
如果并发级别为 16，table 则守护 ConcurrentHashMap 包含的桶总数的 1/16 */
transient volatile HashEntry<K,V>[] table;

//在加锁的情况下读取value值
V readValueUnderLock(HashEntry<K,V> e) {
        lock();
        try {
              return e.value;
         } finally {
                unlock();
       }
}

//从当前segement获取值
 V get(Object key, int hash) {
            if (count != 0) { // read-volatile 因为 volatile  保证可见性。不过没有volatile  修饰，那么这样做是有问题的
                HashEntry<K,V> e = getFirst(hash); //获取当前hash值对应的第一个HashEntry
                while (e != null) {
                    if (e.hash == hash && key.equals(e.key)) { //如果hash和key的值都相等，那么可以确认定位到了数据
                        V v = e.value;
                        if (v != null) 
//注意：虽然value是volatile  修饰，能保证可见性，但是没有final修饰，存在初始化默认值的可能性，那么这样是不符合安全的java内存模型！！！
                            return v;
                        return readValueUnderLock(e); // recheck 需要加锁再次取值。
                    }
                    e = e.next;
                }
            }
            return null;
        }

//是否包含key
boolean containsKey(Object key, int hash) {
       if (count != 0) { // read-volatile
             HashEntry<K,V> e = getFirst(hash);
            while (e != null) {
              if (e.hash == hash && key.equals(e.key)) //key 是 final 和 volatile  修饰，HashEntry 在初始化是安全的，并且是不可变对象，所有是线程安全的
                   return true;
                   e = e.next;
                }
            }
            return false;
     }

//是否包含value
 boolean containsValue(Object value) {
            if (count != 0) { // read-volatile  同上那！！
                HashEntry<K,V>[] tab = table;
                int len = tab.length;
                for (int i = 0 ; i < len; i++) {
                //segment散列的数据全部全部遍历一遍，直到找到value值
                    for (HashEntry<K,V> e = tab[i]; e != null; e = e.next) {
                        V v = e.value;
                        if (v == null) // 仔细思考和containsKey的区别！！！
                            v = readValueUnderLock(e);
                        if (value.equals(v))
                            return true;
                    }
                }
            }
            return false;
        }

//指定key 和 hash的 value替换
boolean replace(K key, int hash, V oldValue, V newValue) {
           //加锁 防止 多个线程replace造成值的覆盖隐患！！！
            lock();
            try {
                HashEntry<K,V> e = getFirst(hash);
// 遍历直到找到key 和hash都符合的值
                while (e != null && (e.hash != hash || !key.equals(e.key)))
                    e = e.next;

                boolean replaced = false;
                if (e != null && oldValue.equals(e.value)) {
                    replaced = true;
                    e.value = newValue; //注意：value是 volatile 修饰的，所以内存可见性得以保证！！，另外修改value 可以确保不会有初始化的安全问题，哈哈
                }
                return replaced;
            } finally {
                unlock();
            }
  }


 //新增元素
 V put(K key, int hash, V value, boolean onlyIfAbsent) {
            lock(); // put之前先加锁，防止其他线程的添加
            try {
                int c = count; //元素数量
                if (c++ > threshold) // ensure capacity 如果元素超过负载值，那么rehash
                    rehash();
                HashEntry<K,V>[] tab = table;
                int index = hash & (tab.length - 1); //计算hash在table中的索引
                HashEntry<K,V> first = tab[index]; // 返回table中第一个元素的HashEntry
                HashEntry<K,V> e = first;
                while (e != null && (e.hash != hash || !key.equals(e.key))) //无止境的搜索啦
                    e = e.next;

                V oldValue;
                if (e != null) {
                    oldValue = e.value;
                    if (!onlyIfAbsent) //如果不存在value，那么设置
                        e.value = value;
                }
                else { //如果不存在，那么添加值，注意把这个值放到了桶的头的位置！！！！
                    oldValue = null;
                    ++modCount;
//没有影响其他读线程啊，因为其他读线程看到的链表的状态没有变啊
                    tab[index] = new HashEntry<K,V>(key, hash, first, value);
                    count = c; // write-volatile 
                }
                return oldValue;
            } finally {
                unlock();
            }
}


//清空段内的数据
void clear() {
            if (count != 0) { //同上！
                lock(); //获取锁，防止其他线程的修改table
                try {
                    HashEntry<K,V>[] tab = table;
                    for (int i = 0; i < tab.length ; i++)
                        tab[i] = null; //滞空
                    ++modCount; //这个一直在加！！！
                    count = 0; // 内存可见性 write-volatile
                } finally {
                    unlock();
                }
            }
        }

//删除元素
V remove(Object key, int hash, Object value) {
            lock();
            try {
                int c = count - 1;
                HashEntry<K,V>[] tab = table;
                int index = hash & (tab.length - 1);
                HashEntry<K,V> first = tab[index];
                //获取到hash对应的头 entry
                HashEntry<K,V> e = first;
                while (e != null && (e.hash != hash || !key.equals(e.key)))
                    e = e.next;

                V oldValue = null;
                if (e != null) {
                	//如果找到值
                    V v = e.value;
                    if (value == null || value.equals(v)) {
                        oldValue = v;
                        // All entries following removed node can stay
                        // in list, but all preceding ones need to be
                        // cloned.
                        ++modCount; //修改次数加加加！！
// 逻辑：newFirst 是 e的下一个节点，然后从头开始遍历，直到到达 e节点，逐个复制生成新节点。注意：e前的数据节点的顺序全部颠倒了！！！
                        HashEntry<K,V> newFirst = e.next;  
                        for (HashEntry<K,V> p = first; p != e; p = p.next)
                            newFirst = new HashEntry<K,V>(p.key, p.hash,
                                                          newFirst, p.value);
                        tab[index] = newFirst;
                        count = c; // write-volatile
                    }
                }
                return oldValue;
            } finally {
                unlock();
            }
        }

ConcurrentHashMap的代码分析：

public class ConcurrentHashMap<K, V> extends AbstractMap<K, V>
        implements ConcurrentMap<K, V>, Serializable {
 //默认初始化的元素量
 static final int DEFAULT_INITIAL_CAPACITY = 16;
 //默认负载因子
 static final float DEFAULT_LOAD_FACTOR = 0.75f;
//默认的并发级别
static final int DEFAULT_CONCURRENCY_LEVEL = 16;
//分段掩码值
final int segmentMask;
//分段偏移量
final int segmentShift;

//散列段
final Segment<K,V>[] segments; //不可变动的

//hash值的生成，这个需要很深的数学功底啊，我是没看懂额
private static int hash(int h) {
        // Spread bits to regularize both segment and index locations,
        // using variant of single-word Wang/Jenkins hash.
        h += (h <<  15) ^ 0xffffcd7d;
        h ^= (h >>> 10);
        h += (h <<   3);
        h ^= (h >>>  6);
        h += (h <<   2) + (h << 14);
        return h ^ (h >>> 16);
    }

/*根据hash值，寻找hash所在segments的分段
  注意：
1、依赖hash右移位，即hash的高位 &  segmentMask 来取得的。
2、 segmentShift：移动多少 和 并发级别是有关系的，具体关系：segmentShift = 32-log2(DEFAULT_CONCURRENCY_LEVEL)
3、想想为什么 & segmentMask ? 答案：避免数组越界！
*/
final Segment<K,V> segmentFor(int hash) {
        return segments[(hash >>> segmentShift) & segmentMask];
}


public V get(Object key) {
       //找到分段的hash值
        int hash = hash(key.hashCode());
       //根据分段找到key 和hash对应的值
        return segmentFor(hash).get(key, hash);
    }

}

//同上
public boolean containsKey(Object key) {
        int hash = hash(key.hashCode());
        return segmentFor(hash).containsKey(key, hash);
    }

//同上 把逻辑委托到Segment的put
public V put(K key, V value) {
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key.hashCode());
        return segmentFor(hash).put(key, hash, value, false);
    }

//将值的替换委托给segment的replace操作
    public V replace(K key, V value) {
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key.hashCode());
        return segmentFor(hash).replace(key, hash, value);
    }

//清空数据，只需要清空每个段即可
  public void clear() {
        for (int i = 0; i < segments.length; ++i)
            segments[i].clear();
    }


/*
获取ConcurrentHash的size大小，需要计算索引Segment里的元素的数量
*/
public int size() {
        final Segment<K,V>[] segments = this.segments;
        long sum = 0;
        long check = 0;
        int[] mc = new int[segments.length];
        // Try a few times to get accurate count. On failure due to
        // continuous async changes in table, resort to locking. 
        /*这样做的目的是为了获取准确的 segments[i].count，因为获取的过程中，
        有可能Segment 一直在更新，所以算是尝试2次，到不到目的则去获取所有Segment 的锁*/
        for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) {
            check = 0;
            sum = 0;
            int mcsum = 0;
            for (int i = 0; i < segments.length; ++i) {
                sum += segments[i].count; //累计Segment的元素数量
                mcsum += mc[i] = segments[i].modCount;//累计Segment的修改次数，并把segments[i] 和  mc[i] 进行关联
            }
            if (mcsum != 0) {
                for (int i = 0; i < segments.length; ++i) {
                    check += segments[i].count; //再次累计获取segments的元素数量
                    if (mc[i] != segments[i].modCount) { //如果segments[i].modCount ！= mc[i]，即可认为中间数据有变动！！！ 那么本次循环已经没有意义，立即退出！
                        check = -1; // force retry
                        break;
                    }
                }
            }
            if (check == sum) //如果check=sum ,那么可以认为两次获取数据中间没有数据变化
                break;
        }
        /*
        如果循环两次 都没有遇到check=sum的情况，可以认为必须全部获取的锁，
        然后才能获取count的数据了，此种情况下可以认定获取返回的数据时准确无误的
        如果check=sum ，此种情况，算是弱一致！！因为可能就在计算刚刚结束，数据都已经更新了！
        */
        if (check != sum) { // Resort to locking all segments 
            sum = 0;
            for (int i = 0; i < segments.length; ++i)
                segments[i].lock();
            for (int i = 0; i < segments.length; ++i)
                sum += segments[i].count;
            for (int i = 0; i < segments.length; ++i)
                segments[i].unlock();
        }
        if (sum > Integer.MAX_VALUE) 
            return Integer.MAX_VALUE;
        else
            return (int)sum;
    }


//map中是否包含value
public boolean containsValue(Object value) {
        if (value == null)
            throw new NullPointerException();
        // See explanation of modCount use above

        final Segment<K,V>[] segments = this.segments;
        int[] mc = new int[segments.length];

        // Try a few times without locking
        //同上，不获取锁的情况下 尝试检测是否包含value的逻辑，要知道获取所有Segement段的锁 是非常影响并发的！！
        for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) {
            int sum = 0;
            int mcsum = 0;
            for (int i = 0; i < segments.length; ++i) {
                int c = segments[i].count;
                mcsum += mc[i] = segments[i].modCount;
                if (segments[i].containsValue(value)) //如果包含，返回true
                    return true;
            }
            boolean cleanSweep = true;
            if (mcsum != 0) {
                for (int i = 0; i < segments.length; ++i) {
                    int c = segments[i].count;
                    if (mc[i] != segments[i].modCount) { //如果此时i = 3,那么i=2 的那个segment 添加了value元素，其实是检测不到的！！！！ 所以算是弱一致性！
                        cleanSweep = false;
                        break;
                    }
                }
            }
//如果cleanSweep = true，说明segment所有段 都没有更新，所有可以弱弱的认定 上面的测试是正确的！ 但是从逻辑分析上看，这样其实有一定的风险 ，同上分析
            if (cleanSweep)
                return false;
        }
        // Resort to locking all segments 没点了，搞锁吧！
        for (int i = 0; i < segments.length; ++i)
            segments[i].lock();
        boolean found = false;
        try {
            for (int i = 0; i < segments.length; ++i) {
                if (segments[i].containsValue(value)) {
                    found = true; //这个是准确的！！！
                    break;
                }
            }
        } finally {
            for (int i = 0; i < segments.length; ++i)
                segments[i].unlock(); //释放锁
        }
        return found;
    }

分享到：

ConcurrentModificationException 异常的抛 ... | 源码剖析之CopyOnWriteArraySet

2013-05-20 19:57
浏览 1622
评论(1)
分类:编程语言
查看更多

1 楼王新春 2013-05-21

有个问题不太理解的，Segment 的modCount 属性没有volatile修饰，是否存在线程间可见性的问题？

public int size() {
        final Segment<K,V>[] segments = this.segments;
        long sum = 0;
        long check = 0;
        int[] mc = new int[segments.length];
        // Try a few times to get accurate count. On failure due to
        // continuous async changes in table, resort to locking.
        for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) {
            check = 0;
            sum = 0;
            int mcsum = 0;
            for (int i = 0; i < segments.length; ++i) {
                sum += segments[i].count;
                mcsum += mc[i] = segments[i].modCount; //modCount 是非volatile 修饰，这样是否存在线程可见性的问题？
            }
            if (mcsum != 0) {
                for (int i = 0; i < segments.length; ++i) {
                    check += segments[i].count;
                    if (mc[i] != segments[i].modCount) {
                        check = -1; // force retry
                        break;
                    }
                }
            }
            if (check == sum)
                break;
        }
        if (check != sum) { // Resort to locking all segments
            sum = 0;
            for (int i = 0; i < segments.length; ++i)
                segments[i].lock();
            for (int i = 0; i < segments.length; ++i)
                sum += segments[i].count;
            for (int i = 0; i < segments.length; ++i)
                segments[i].unlock();
        }
        if (sum > Integer.MAX_VALUE)
            return Integer.MAX_VALUE;
        else
            return (int)sum;
    }

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论