Summary:
Every Java object has a hashCode()
and an equals()
method. Many classes override the default implementations of these
methods to provide a higher degree of semantic comparability between
object instances. In this installment of Java theory and practice
, Java developer Brian Goetz shows you the rules and guidelines you should follow when creating Java classes in order to define hashCode()
and equals()
effectively and appropriately.
While the Java language does not provide direct support for associative
arrays -- arrays that can take any object as an index -- the presence
of the hashCode()
method in the root Object
class clearly anticipates the ubiquitous use of HashMap
(and its predecessor, Hashtable
).
Under ideal conditions, hash-based containers offer both efficient
insertion and efficient retrieval; supporting hashing directly in the
object model facilitates the development and use of hash-based
containers.
The Object
class has two methods for making inferences about an object's identity: equals()
and hashCode()
.
In general, if you override one of these methods, you must override
both
, as there are important relationships between them that must be
maintained. In particular, if two objects are equal according to the equals()
method, they must have the same hashCode()
value (although the reverse is not generally true).
The semantics of equals()
for a given class are left to the implementer; defining what equals()
means for a given class is part of the design work for that class. The default implementation, provided by Object
, is simply reference equality:
public boolean equals(Object obj) { return (this == obj); } |
Under this default implementation, two references are equal only
if they refer to the exact same object
. Similarly, the default implementation of hashCode()
provided by Object
is derived by mapping the memory address of the object to an integer
value
. Because on some architectures the address space is larger than
the range of values for int
, it is possible that two distinct objects could have the same hashCode()
. If you override hashCode()
, you can still use the System.identityHashCode()
method to access this default value.
Overriding equals() -- a simple example
An identity-based implementation for equals()
and hashCode()
is a sensible default, but for some classes, it is desirable to relax the definition of equality somewhat. For example, the Integer
class defines equals()
similarly to this:
public boolean equals(Object obj) { return (obj instanceof Integer && intValue() == ((Integer) obj).intValue()); } |
Under this definition, two Integer
objects are equal only
if they contain the same integer value. This, along with Integer
being immutable, makes it practical to use an Integer
as a key in a HashMap
. This value-based approach to equality is used by all the primitive wrapper classes in the Java class library, such as Integer
, Float
, Character
, and Boolean
, as well as String
(two String
objects are equal if they contain the same sequence of characters). Because these classes are immutable and implement hashCode()
and equals()
sensibly, they all make good hash keys.
Why override equals() and hashCode()?
What would happen if Integer
did not override equals()
and hashCode()
? Nothing, if we never used an Integer
as a key in a HashMap
or other hash-based collection
. However, if we were to use such an Integer
object for a key in a HashMap
, we would not be able to reliably retrieve the associated value, unless we used the exact same Integer
instance in the get()
call as we did in the put()
call. This would require ensuring that we only use a single instance of the Integer
object corresponding to a particular integer value throughout our
program. Needless to say, this approach would be inconvenient and error
prone.
The interface contract for Object
requires that if two objects are equal according to equals()
, then they must have the same hashCode()
value
. Why does our root object class need hashCode()
, when its discriminating ability is entirely subsumed by that of equals()
? The hashCode()
method exists purely for efficiency
. The Java platform architects
anticipated the importance of hash-based collection classes -- such as Hashtable
, HashMap
, and HashSet
-- in typical Java applications, and comparing against many objects with equals()
can be computationally expensive. Having every Java object support hashCode()
allows for efficient storage and retrieval using hash-based collections.
Requirements for implementing equals() and hashCode()
There are some restrictions placed on the behavior of equals()
and hashCode()
, which are enumerated in the documentation for Object
. In particular, the equals()
method must exhibit the following properties:
-
Symmetry[对称]
: For two references,
a
andb
,a.equals(b)
if and only ifb.equals(a)
-
Reflexivity[自反性]
: For all non-null references,
a.equals(a)
-
Transitivity[传递性]
: If
a.equals(b)
andb.equals(c)
, thena.equals(c)
-
Consistency with
hashCode()
: Two equal objects must have the samehashCode()
value
The specification for Object
offers a vague[含糊的] guideline that equals()
and hashCode()
be consistent
-- that their results will be the same for subsequent invocations,
provided that "no information used in equals comparison on the object
is modified." This sounds sort of like "the result of the calculation
shouldn't change, unless it does." This vague statement is generally
interpreted to mean that equality and hash value calculations should be
a deterministic function of an object's state and nothing else.
The requirements for equals()
and hashCode()
imposed by the Object class specification are fairly simple to follow. Deciding whether, and how, to override equals()
requires a little more judgment. In the case of simple immutable value classes, such as Integer
(and in fact for nearly all immutable classes), the choice is fairly
obvious -- equality should be based on the equality of the underlying
object state. In the case of Integer
, the object's only state is the underlying
integer value.
For mutable objects, the answer is not always so clear. Should equals()
and hashCode()
be based on the object's identity (like the default implementation) or
the object's state (like Integer and String)? There's no easy answer --
it depends on the intended use of the class. For containers like List
and Map
,
one could have made a reasonable argument either way. Most classes in
the Java class library, including container classes, err on the side of
providing an equals()
and hashCode()
implementation based on the object state.
If an object's hashCode()
value can change based on
its state, then we must be careful when using such objects as keys in
hash-based collections to ensure that we don't allow their state to
change when they are being used as hash keys
. All hash-based
collections assume that an object's hash value does not change while it
is in use as a key in the collection. If a key's hash code were to
change while it was in a collection, some unpredictable and confusing
consequences could follow. This is usually not a problem in practice --
it is not common practice to use a mutable object like a List
as a key in a HashMap
.
An example of a simple mutable class that defines equals()
and hashCode()
based on its state is Point
. Two Point
objects are equal if they refer to the same (x, y)
coordinates, and the hash value of a Point
is derived from the IEEE 754-bit representation of the x
and y
coordinate values.
For more complex classes, the behavior of equals()
and hashCode()
may even be imposed by the specification of a superclass or interface. For example, the List
interface requires that a List
object is equal to another object if and only if the other object is also a List
and they contain the same elements (defined by Object.equals()
on the elements) in the same order. The requirements for hashCode()
are defined with even more specificity -- the hashCode()
value of a list must conform to the following calculation:
hashCode = 1; Iterator i = list.iterator(); while (i.hasNext()) { Object obj = i.next(); hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode()); } |
Not only is the hash value dependent on the contents of the list, but
the specific algorithm for combining the hash values of the individual
elements is specified as well. (The String
class specifies a similar algorithm to be used for computing the hash value of a String
.)
Writing your own equals() and hashCode() methods
Overriding the default equals()
method is fairly easy, but overriding an already overridden equals()
method can be extremely tricky to do without violating either the symmetry or transitivity requirement. When overriding equals()
, you should always include some Javadoc comments on equals()
to help those who might want to extend your class do so correctly.
As a simple example, consider the following class:
class A { final B someNonNullField; C someOtherField; int someNonStateField; } |
How would we write the equals()
method for this class? This way is suitable for many situations:
public boolean equals(Object other) { // Not strictly necessary, but often a good optimization if (this == other) return true; if (!(other instanceof A)) return false; A otherA = (A) other; return (someNonNullField.equals(otherA.someNonNullField)) && ((someOtherField == null) ? otherA.someOtherField == null : someOtherField.equals(otherA.someOtherField))); } |
Now that we've defined equals()
, we have to define hashCode()
in a compatible manner. One compatible, but not all that useful, way to define hashCode()
is like
this:
public int hashCode() { return 0; } |
This approach will yield horrible performance for HashMap
s with a large number of entries, but it does conform to the specification. A more sensible implementation of hashCode()
for A
would be like this:
public int hashCode() { int hash = 1; hash = hash * 31 + someNonNullField.hashCode(); hash = hash * 31 + (someOtherField == null ? 0 : someOtherField.hashCode()); return hash; } |
Note that both of these implementations delegate a portion of the computation to the equals()
or hashCode()
method of the state fields of the class. Depending on your class, you
may also want to delegate part of the computation to the equals()
or hashCode()
function of the superclass. For primitive fields, there are helper
functions in the associated wrapper classes that can help in creating
hash values, such as Float.floatToIntBits
.
Writing an equals()
method is not without pitfalls. In general, it is impractical[不切实际的] to cleanly override equals()
when extending an instantiable class that itself overrides equals()
, and writing an equals()
method that is intended to be overridden (such as in an abstract class) is done differently than writing an equals()
method for a concrete class. See Effective Java Programming Language Guide
, Item 7 (in Resources
) for some examples and more details about why this is so.
Building hashing into the root object class of the Java class
library was a very sensible design compromise -- it makes using
hash-based containers so much easier and more efficient. However,
several criticisms have been made of the approach to and implementation
of hashing and equality in the Java class library. The hash-based
containers in java.util
are very convenient and easy to
use, but may not be suitable for applications that require very high
performance. While most of these will never be changed, it is
worthwhile to keep in mind when you're designing applications that rely
heavily on the efficiency of hash-based containers. These criticisms
include:
- Too small a hash range. Using
int
, instead oflong
, for the return type ofhashCode()
increases the possibility of hash collisions.
- Bad distribution of hash values. The hash values for short strings
and small integers are themselves small integers, and are close to the
hash values of other "nearby" objects. A more well-behaved hash
function would distribute the hash values more evenly across the hash
range.
- No defined hashing operations. While some classes, such as
String
andList
, define a hash algorithm to be used in combining the hash values of its constituent elements into a single hash value, the language specification does not define any approved means of combining the hash values of multiple objects into a new hash value. The trick used byList
,String
, or the example classA
discussed earlier in Writing your own equals() and hashCode() methods are simple, but far from mathematically ideal. Nor does the class library offer convenience implementations of any hashing algorithm that would simplify the creation of more sophisticatedhashCode()
implementations.
- Difficulty writing
equals()
when extending an instantiable class that already overridesequals()
. The "obvious" ways to defineequals()
when extending an instantiable class that already overridesequals()
all fail to meet the symmetry or transitivity requirements of theequals()
method. This means that you must understand the structure and implementation details of classes you are extending when overridingequals()
, and may even need to expose private fields in the base class as protected to do so, which violates principles of good object-oriented design.
By defining equals()
and hashCode()
consistently, you can improve the usability of your classes as keys in
hash-based collections. There are two approaches to defining equality
and hash value: identity-based, which is the default provided by Object
, and state-based, which requires overriding both equals()
and hashCode()
.
If an object's hash value can change when its state changes, be sure
you don't allow its state to change while it is being used as a hash
key.
发表评论
-
Java theory and practice: To mutate or not to mutate?
2009-08-15 00:32 710Summary: Immutable objects h ... -
Java theory and practice: Building a better HashMap
2009-08-15 00:27 895Summary: ConcurrentHashMap ... -
Java theory and practice: Concurrent collections classes
2009-08-15 00:23 967Summary: In addition to ma ...
相关推荐
R(2)PCAH: Hashing with two-fold randomness on principal projections
"hashing-baseline-for-image-retrieval-master"项目就是一个专注于哈希算法在图像检索应用中的基准实现。 哈希算法是信息检索和数据存储的基础工具,它能够将任意大小的数据映射到固定长度的哈希值,通常是一个短...
This book takes a step-by-step approach and is filled with exciting projects and fascinating theory. It uses three high-impact sections to equip you with all the tools you'll need to build modern, ...
04-LSH theory of Locality Sensitive Hashing 05-聚类算法 clustering 06-降维技术 Dimensionality Reduction:SVD&CUR 07-推荐系统 Recommender Systems:Content-based Systems&Collaborative Filtering recsys1 ...
资源分类:Python库 所属语言:Python 资源全名:password_hashing_python-0.3.0-py2.py3-none-any.whl 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
04-LSH theory of Locality Sensitive Hashing 05-聚类算法 clustering 06-降维技术 Dimensionality Reduction:SVD&CUR 07-推荐系统 Recommender Systems:Content-based Systems&Collaborative Filtering recsys1 ...
04-LSH theory of Locality Sensitive Hashing 05-聚类算法 clustering 06-降维技术 Dimensionality Reduction:SVD&CUR 07-推荐系统 Recommender Systems:Content-based Systems&Collaborative Filtering recsys1 ...
04-LSH theory of Locality Sensitive Hashing 05-聚类算法 clustering 06-降维技术 Dimensionality Reduction:SVD&CUR 07-推荐系统 Recommender Systems:Content-based Systems&Collaborative Filtering recsys1 ...
04-LSH theory of Locality Sensitive Hashing 05-聚类算法 clustering 06-降维技术 Dimensionality Reduction:SVD&CUR 07-推荐系统 Recommender Systems:Content-based Systems&Collaborative Filtering recsys1 ...
04-LSH theory of Locality Sensitive Hashing 05-聚类算法 clustering 06-降维技术 Dimensionality Reduction:SVD&CUR 07-推荐系统 Recommender Systems:Content-based Systems&Collaborative Filtering recsys1 ...
04-LSH theory of Locality Sensitive Hashing 05-聚类算法 clustering 06-降维技术 Dimensionality Reduction:SVD&CUR 07-推荐系统 Recommender Systems:Content-based Systems&Collaborative Filtering recsys1 ...
Hierarchical Consensus Hashing for Cross-Modal Retrieval
文档提及的“09.Dictionary.A.hashing.Call-by-value”很可能是指在字典这一数据结构中应用哈希函数和冲突解决策略,并讨论了值传递的机制。具体来说,文档可能涉及了如何在实现一个字典时利用哈希表的特性来存储和...
Guava是Google开发的一个开源Java库,包含了众多的实用工具类和集合框架,极大地提高了Java开发的效率。在本例中,我们关注的是"guava-30.1-jre.jar",这是一个针对Java运行环境(JRE)优化的Guava版本,版本号为...
jar包,官方版本,自测可用
在LeetCode的"Hashing-2"系列中,我们可以通过解决一系列问题来深入理解和应用哈希技巧。 问题1: 这个问题可能涉及到使用哈希表来快速检查元素是否存在或者实现去重。哈希表(如Java中的HashMap或C++中的unordered_...