深入理解HashCode

hz_chenwenbiao

浏览: 1010546 次
性别:
来自: 广州

最近访客更多访客>>

linqingrui1985

zhongguocxy

ccsxin201

langgufu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

JAVA基础知识

算法 JVM thread 面试数据结构

在面试的时候被问到hashCode的作用，那时我没答好，所以现在在网上找了一些例子和讲解，现在总结一下：

哈希码产生的依据：
哈希码并不是完全唯一的，它是一种算法，让同一个类的对象按照自己不同的特征尽量的有不同的哈希码，但不表示不同的对象哈希码完全不同。也有相同的情况，看程序员如何写哈希码的算法。

下面给出几个常用的哈希码的算法:

1：Object类的hashCode.返回对象的内存地址经过处理后的结构，由于每个对象的内存地址都不一样，所以hashcode可以做到尽可能的不一样，但我们要清楚一点，既然是用到hash技术，就是要解决冲突的，所以hashcode是会出现相同的时候，我们可以将hashCode相同的看成放入同一个桶中。

package com.tools;

import java.util.ArrayList;

/**
* 此方法的作用是证明 java.lang.Object的hashcode 不是代表 对象所在内存地址。
* 我产生了10000个对象，这10000个对象在内存中是不同的地址，但是实际上这10000个对象
* 的hashcode的是完全可能相同的
*/
public class HashCodeMeaning {
    public static void main(String[] args) {
        ArrayList list = new ArrayList();
        int numberExist=0;
        
        System.out.println("______________证明hashcode的值不是内存地址________________");
        //证明hashcode的值不是内存地址
        for (int i = 0; i < 10000; i++) {
            Object obj=new Object();//obj是内存地址
            if (list.contains(obj.hashCode())) {//获得对象的hashCode
                System.out.println(obj.hashCode() +" exists in the list. "+ i);
                numberExist++;
            }
            else {
                list.add(obj.hashCode());
            }
        }
        
        System.out.println("repetition number:"+numberExist);//和重复的hashCode，说明它不是内存地址
        System.out.println("list size:"+list.size());
        
        
        System.out.println("____________证明内存地址是不同的_______________");
        //证明内存地址是不同的。
        numberExist=0;
        list.clear();
        for (int i = 0; i < 10000; i++) {
            Object obj=new Object();//获得的是对象的内存地址
            if (list.contains(obj)) {//直接加入内存地址
                System.out.println(obj +" exists in the list. "+ i);
                numberExist++;
            }
            else {
                list.add(obj);
            }
        }
        
        //内存地址没重复
        System.out.println("repetition number:"+numberExist);
        System.out.println("list size:"+list.size());
    }
}

2：String类的hashCode.根据String类包含的字符串的内容，根据一种特殊算法返回哈希码，只要字符串内容相同，返回的哈希码也相同，这个可以简单实现如将串的各个字母相与的结果作为hashcode，而sun实现比这个复杂多。

3：Integer类，返回的哈希码就是Integer对象里所包含的那个整数的数值，例如Integer i1=new Integer(100),i1.hashCode的值就是100 。由此可见，2个一样大小的Integer对象，返回的哈希码也一样。

public native int hashCode()；

由于是native方法，跟OS的处理方式相关，源代码里仅仅有一个声明罢了。我们有兴趣的话完全可以去深究它的hashCode到底是由OS怎么样产生的呢？但笔者建议最重要的还是先记住使用它的几条原则吧！首先如果equals()方法相同的对象具有相通的hashCode，但equals ()对象不相通的时候并不保证hashCode()方法返回不同的整数。而且下一次运行同一个程序，同一个对象未必还是当初的那个hashCode() 哦。

到 OpenJDK 下载 OpenJDK 的源代码，解压后找到 hotspot/src/share/vm/runtime 目录，里面有个 synchronizer.cpp 文件，找到：

intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
  if (UseBiasedLocking) {
    // NOTE: many places throughout the JVM do not expect a safepoint
    // to be taken here, in particular most operations on perm gen
    // objects. However, we only ever bias Java instances and all of
    // the call sites of identity_hash that might revoke biases have
    // been checked to make sure they can handle a safepoint. The
    // added check of the bias pattern is to avoid useless calls to
    // thread-local storage.
    if (obj->mark()->has_bias_pattern()) {
      // Box and unbox the raw reference just in case we cause a STW safepoint.
      Handle hobj (Self, obj) ;         
      // Relaxing assertion for bug 6320749.
      assert (Universe::verify_in_progress() ||
          !SafepointSynchronize::is_at_safepoint(),
         "biases should not be seen by VM thread here");
      BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
      obj = hobj() ; 
      assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
    }
  }

  // hashCode() is a heap mutator ...
  // Relaxing assertion for bug 6320749.
  assert (Universe::verify_in_progress() ||
      !SafepointSynchronize::is_at_safepoint(), "invariant") ; 
  assert (Universe::verify_in_progress() ||
      Self->is_Java_thread() , "invariant") ; 
  assert (Universe::verify_in_progress() ||
     ((JavaThread *)Self)->thread_state() != _thread_blocked, "invariant") ;

  ObjectMonitor* monitor = NULL;
  markOop temp, test;
  intptr_t hash;
  markOop mark = ReadStableMark (obj);

  // object should remain ineligible for biased locking 
  assert (!mark->has_bias_pattern(), "invariant") ; 
 
  if (mark->is_neutral()) {
    hash = mark->hash();              // this is a normal header
    if (hash) {                       // if it has hash, just return it
      return hash;
    }
    hash = get_next_hash(Self, obj);  // allocate a new hash code
    temp = mark->copy_set_hash(hash); // merge the hash code into header
    // use (machine word version) atomic operation to install the hash
    test = (markOop) Atomic::cmpxchg_ptr(temp, obj->mark_addr(), mark);
    if (test == mark) {
      return hash;
    }
    // If atomic operation failed, we must inflate the header
    // into heavy weight monitor. We could add more code here
    // for fast path, but it does not worth the complexity.
  } else if (mark->has_monitor()) {
    monitor = mark->monitor();
    temp = monitor->header();
    assert (temp->is_neutral(), "invariant") ; 
    hash = temp->hash();
    if (hash) {
      return hash;
    }
    // Skip to the following code to reduce code size
  } else if (Self->is_lock_owned((address)mark->locker())) {
    temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
    assert (temp->is_neutral(), "invariant") ; 
    hash = temp->hash();              // by current thread, check if the displaced
    if (hash) {                       // header contains hash code
      return hash;
    }
    // WARNING:
    //   The displaced header is strictly immutable.
    // It can NOT be changed in ANY cases. So we have 
    // to inflate the header into heavyweight monitor
    // even the current thread owns the lock. The reason
    // is the BasicLock (stack slot) will be asynchronously 
    // read by other threads during the inflate() function.
    // Any change to stack may not propagate to other threads
    // correctly.
  }

  // Inflate the monitor to set hash code
  monitor = ObjectSynchronizer::inflate(Self, obj);
  // Load displaced header and check it has hash code
  mark = monitor->header();
  assert (mark->is_neutral(), "invariant") ; 
  hash = mark->hash();
  if (hash == 0) {
    hash = get_next_hash(Self, obj);
    temp = mark->copy_set_hash(hash); // merge hash code into header
    assert (temp->is_neutral(), "invariant") ; 
    test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);
    if (test != mark) {
      // The only update to the header in the monitor (outside GC)
      // is install the hash code. If someone add new usage of
      // displaced header, please update this code
      hash = test->hash();
      assert (test->is_neutral(), "invariant") ; 
      assert (hash != 0, "Trivial unexpected object/monitor header usage.");
    }
  }
  // We finally get the hash
  return hash;
}

这个函数基本上就在这了，原始的 hashCode 在 jdk/src/share/native/java/lang/Object.c 这个文件中，调来调去就调到上面那个函数去了。

OpenJDK 6 的下载页面：http://download.java.net/openjdk/jdk6/ 上面有个 46.9MB 的 tar.gz 文件，下载回来后是整个 JDK（JVM、JDK 工具和 J2SE 类库和底层类库）的源代码，解压后有 254MB，大约有 28750 个文件。

2．关于重载hashCode()与Collection框架的关系
笔者曾经听一位搞Java培训多年的前辈说在他看来hashCode方法没有任何意义，仅仅是为了配合证明具有同样的hashCode会导致equals 方法相等而存在的。连有的前辈都犯这样的错误，其实说明它还是满容易被忽略的。那么hashCode()方法到底做什么用？

学过数据结构的课程大家都会知道有一种结构叫hash table，目的是通过给每个对象分配一个唯一的索引来提高查询的效率。那么Java也不会肆意扭曲改变这个概念，所以hashCode唯一的作用就是为支持数据结构中的哈希表结构而存在的，换句话说，也就是只有用到集合框架的 Hashtable、HashMap、HashSet的时候，才需要重载hashCode()方法，
这样才能使得我们能人为的去控制在哈希结构中索引是否相等。笔者举一个例子：
曾经为了写一个求解类程序，需要随机列出1,2,3,4组成的不同排列组合，所以笔者写了一个数组类用int[]来存组合结果，然后把随机产生的组合加入一个HashSet中，就是想利用HashSet不包括重复元素的特点。可是HashSet怎么判断是不是重复的元素呢？当然是通过 hashCode()返回的结果是否相等来判断啦，可做一下这个实验：
int[] A = {1,2,3,4};
int[] B = {1,2,3,4};
System.out.println(A.hashCode());
System.out.println(B.hashCode());

这明明是同一种组合，却是不同的hashCode，加入Set的时候会被当成不同的对象。这个时候我们就需要自己来重写hashCode()方法了，如何写呢？其实也是基于原始的hashCode()，毕竟那是操作系统的实现，找到相通对象唯一的标识，实现方式很多，笔者的实现方式是：
首先重写了toString()方法:
return A[0]“+” A[1]“+” A[2]“+” A[3]; //显示上比较直观
然后利用toString()来计算hashCode()：
return this.toString().hashCode()；
这样上述A和B返回的就都是”1234”，在测试toString().hashCode()，由于String在内存中的副本是一样的，”1234”.hashCode()返回的一定是相同的结果。

说到这，相信大家能理解得比我更好，今后千万不要再误解hashCode()方法的作用。

其余的方法呢？nofigy()、notifyAll()、clone()、wait()都是native方法的，说明依赖于操作系统的实现。最后一个有趣的方法是finalize()，类似C++的析构函数，签名是protected，证明只有继承扩展了才能使用，方法体是空的，默示什么也不做。它的作用据笔者的了解仅仅是通知JVM此对象不再使用，随时可以被销毁，而实际的销毁权还是在于虚拟机手上。那么它真的什么也不做麽？未必，实际上如果是线程对象它会导致在一定范围内该线程的优先级别提高，导致更快的被销毁来节约内存提高性能。其实从常理来说，我们也可以大概这样猜测出jvm做法的目的。

分享到：

相关包的下载网址 | hashcode的作用（转）

2010-05-19 14:01
浏览 2286
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论