1. Hashing
-- Save items in a key-indexed table (index is a function of the key).
-- Hash function: Method for computing array index from key.
-- Classic space-time tradeoff.
-- No space limitation: trivial hash function with key as index.
-- No time limitation: trivial collision resolution with sequential search.
-- Space and time limitations: hashing (the real world).
2. Computing the hash function
-- Idealistic goal: scramble the keys uniformly to produce a table index.
-- Efficiently computable.
-- Each table index equally likely for each key.
-- Practical challenge. Need different approach for each key type.
3. Java hashCode implementation :
-- Integer :
public final class Integer { private final int value; ... public int hashCode() { return value; //using value of the Integer itself } }
-- Boolean :
public final class Boolean { private final boolean value; ... public int hashCode() { if (value) return 1231; else return 1237; } }
-- Double :
public final class Double { private final double value; ... public int hashCode() { long bits = doubleToLongBits(value); //convert to IEEE 64-bit representation; return (int) (bits ^ (bits >>> 32)); //xor most significant 32-bits with least significant 32-bits } }
-- String :
public final class String { private final char[] s; ... public int hashCode() { int hash = 0; // Horner's method to hash string of length L: L multiplies/adds. // Equivalent to h = s[0] · 31^(L–1) + … + s[L – 3] · 31^2 + s[L – 2] · 31^1 + s[L – 1] · 31^0. for (int i = 0; i < length(); i++) hash = s[i] + (31 * hash); return hash; } }
-- user-defined types
public final class Transaction implements Comparable<Transaction> { private final String who; private final Date when; private final double amount; ... public int hashCode() { int hash = 17; hash = 31*hash + who.hashCode(); hash = 31*hash + when.hashCode(); hash = 31*hash + ((Double) amount).hashCode(); return hash; } }
-- "Standard" recipe for user-defined types:
-- Combine each significant field using the 31x + y rule.
-- If field is a primitive type, use wrapper type hashCode().
-- If field is null, return 0.
-- If field is a reference type, use hashCode().
-- If field is an array, apply to each entry.
-- Modular hashing:
-- Hash code: An int between -2^31 and 2^31 - 1.
-- Hash function: An int between 0 and M - 1 (for use as array index).
-- Math.abs(key.hashCode()) % M --> 1-in-a-billion bug: hashCode() of "polygenelubricants" is -2^31
-- correct implementation:
private int hash(Key key) { return (key.hashCode() & 0x7fffffff) % M; }
4. Uniform hashing assumption:
-- Uniform hashing assumption: Each key is equally likely to hash to an integer between 0 and M - 1.
-- Birthday problem: Expect two balls in the same bin after ~ sqrt( π M / 2 ) tosses.
-- Coupon collector: Expect every bin has ≥ 1 ball after ~ M ln M tosses.
-- Load balancing: After M tosses, expect most loaded bin has Θ ( log M / log log M ) balls.
5. Separate chaining hash table
-- Use an array of M < N linked lists.
-- Hash: map key to integer i between 0 and M - 1.
-- Insert: put at front of ith chain (if not already there).
-- Search: need to search only ith chain.
-- Java Implementation
public class SeparateChainingHashST<Key, Value> { private int M = 97; // number of chains private Node[] st = new Node[M]; // array of chains , array doubling and halving code omitted private static class Node { private Object key; private Object val; private Node next; } private int hash(Key key) { return (key.hashCode() & 0x7fffffff) % M; } public Value get(Key key) { int i = hash(key); for (Node x = st[i]; x != null; x = x.next) if (key.equals(x.key)) return (Value) x.val; return null; } public void put(Key key, Value val) { int i = hash(key); for (Node x = st[i]; x != null; x = x.next) if (key.equals(x.key)) { x.val = val; return; } st[i] = new Node(key, val, st[i]); } }
6. Analysis of separate chaining :
-- Proposition: Under uniform hashing assumption, prob. that the number of keys in a list is within a constant factor of N / M is extremely close to 1.
-- Consequence: Number of probes for search/insert is proportional to N / M.
-- M too large ⇒ too many empty chains.
-- M too small ⇒ chains too long.
-- Typical choice: M ~ N / 5 ⇒ constant-time ops.
7. Open addressing
-- When a new key collides, find next empty slot, and put it there.
-- Hash: Map key to integer i between 0 and M-1.
-- Insert: Put at table index i if free; if not try i+1, i+2, etc.
-- Search. Search table index i; if occupied but no match, try i+1, i+2, etc.
-- Array size M must be greater than number of key-value pairs N.
-- Java Implementation
public class LinearProbingHashST<Key, Value> { private int M = 30001; private Value[] vals = (Value[]) new Object[M]; // array doubling and halving code omitted private Key[] keys = (Key[]) new Object[M]; private int hash(Key key) { /* as before */ } public void put(Key key, Value val) { int i; for (i = hash(key); keys[i] != null; i = (i+1) % M) if (keys[i].equals(key)) break; keys[i] = key; vals[i] = val; } public Value get(Key key){ for (int i = hash(key); keys[i] != null; i = (i+1) % M) if (key.equals(keys[i])) return vals[i]; return null; } }
8. Knuth's parking problem
-- Model. Cars arrive at one-way street with M parking spaces. Each desires a random space i : if space i is taken, try i + 1, i + 2, etc. What is mean displacement of a car?
-- Half-full: With M / 2 cars, mean displacement is ~ 3 / 2.
-- Full: With M cars, mean displacement is ~ sqrt( π M / 8 ).
-- Proposition. Under uniform hashing assumption, the average # of probes in a linear probing hash table of size M that contains N = α M keys is: ~1/2(1+1/(1-α ) ) for search hit and ~1/2(1+1/(1-α )^2) for search miss or insert.
-- Parameter choice
-- M too large ⇒ too many empty array entries.
-- M too small ⇒ search time blows up.
-- Typical choice: α = N / M ~ ½.
9. ST implementations summary:
10. Hashing Variations:
-- Two-probe hashing. (separate-chaining variant)
-- Hash to two positions, insert key in shorter of the two chains.
-- Reduces expected length of the longest chain to log log N.
-- Double hashing. (linear-probing variant)
-- Use linear probing, but skip a variable amount, not just 1 each time.
-- Effectively eliminates clustering.
-- Can allow table to become nearly full.
-- More difficult to implement delete.
-- Cuckoo hashing. (linear-probing variant)
-- Hash key to two positions; insert key into either position; if occupied, reinsert displaced key into its alternative position (and recur).
-- Constant worst case time for search.
相关推荐
"C macros for hash tables and more.zip"这个压缩包很可能包含了一些用于创建和操作哈希表的宏定义,以及可能的其他实用程序。哈希表是一种数据结构,它提供了一种快速访问和存储数据的方式,通过使用哈希函数将键...
文件`在Java中运用Hashtables.doc`可能包含了关于如何在实际项目中使用`Hashtable`的详细教程,包括示例代码和最佳实践。而`www.pudn.com.txt`可能是某个网站的链接或资源引用,可能提供了更多的学习资源和讨论。在...
useHashTables.cpp -- code that demonstrates the use of the hash tables
在Java中运用Hashtables Hashtables 在计算机领域中已不是一个新概念了。它们是用来加快计算机的处理速度的,用当今的标准来处理,速度非常慢,而它们可以让你在查询许多数据条目时,很快地找到一个特殊的条目。...
### 分布式哈希表(DHT)在成员变动下的性能对比分析 #### 摘要与背景 本文深入探讨了分布式哈希表(DHT)在面对成员变动(churn)时的不同协议表现及其通信成本。DHT是用于结构化P2P网络中的关键组件之一,它提供...
Among these, cuckoo hashtables provide excellent performance by allowing lookupsto be processed with very few memory accesses (2 to 3 perlookup). Yet, for large tables, cuckoo hash tables remain mem-...
Algorithms Lecture 12: Hash Tables [Fa’13]Insanity is repeating the same mistakes and expecting different results.— Narcotics Anonymous (1981)Calvin: There! I finished our secret code! Hobbes: Let’...
使用Golang Hashtables的非常简单,惯用和线程安全的实现。 安装 ❯ go get -u github.com/HotPotatoC/hashtable 用法 设定值 ht := hashtable . New () ht . Set ( "user" , "John" ) 获得价值 ht . Get ( "user" ...
WijsAnalysing the Performance of GPU Hash Tables for State Space ExplorationNathan Cassee Eindhoven University of TechnologyEindhoven, The Netherlands N.W.Cassee@student.tue.nlAnton Wijs Eindhoven ...
在`Hashtables-master`这个压缩包中,可能包含了不同哈希表实现的源代码,可以作为学习和参考的资源。这些代码可能涵盖了上述各种实现细节,通过阅读和分析这些代码,可以更深入地理解哈希表的工作原理以及如何在...
Disk Based Hash Tables and Quantified NumbersEdscott Wilson GarciaMarch 24, 2014AbstractThis paper describes the design of Disk Based Hash Tables: n–dimen- sional self–indexed data tables. The ...
16Concurrent Hash Tables: Fast and General(?)!TOBIAS MAIER and PETER SANDERS, Karlsruhe Institute of Technology, GermanyROMAN DEMENTIEV, Intel Deutschland GmbH, GermanyConcurrent hash tables are one ...
Concurrent Hash Tables: Fast and General(?)!Tobias Maier1, Peter Sanders1, and Roman Dementiev21 Karlsruhe Institute of Technology, Karlsruhe, Germany {t.maier,sanders}@kit.edu 2 Intel Deutschland ...
Goals: This assignment is designed to reinforce the student's understanding of the use of hash tables as searchable containers. Outcomes: Students successfully completing this assignment would master...
Goals: This assignment is designed to reinforce the student's understanding of the use of hash tables as searchable containers. Outcomes: Students successfully completing this assignment would master...
GNU库。 截至 2014 年 4 月 9 日,DBH 是一个 GNU 程序,并更名为“GNU libdbh”。 用于创建和管理基于 64 位磁盘的哈希表的库 在键和值之间建立关联,以便可以快速找到给定键的值并将其加载到内存中:64 位支持
built-in arrays 适合固定大小的数组,Javascript Arrays 适合需要动态添加或删除元素的场景,ArrayLists 适合需要存储不同类型元素的场景,Hashtables 适合需要快速查找元素的场景,Generic List 和 Generic ...
哈希表 哈希表的java表示 单独的链接哈希 二次探测哈希 需要确保质数大小的工作; 容易修复 -> 前 20 个素数的数组? 麻省理工学院许可证 (MIT) 版权所有 (c) [年份] [全名] 特此授予任何人免费获得本软件副本和...
You will increase the range of problems you can solve when you learn how to manipulate versatile and popular data structures such as binary trees and hash tables. This book assumes you have a ...
分布式哈希表(Distributed Hash Tables,简称DHT)是一种分布式系统,它具有可扩展性、鲁棒性和自组织特性。这种系统能够支持精确匹配查找。DHT利用结构化的覆盖网络,将给定的键映射到持有与之相关联的对象的网络...