Universal Hash

leonzhx

浏览: 809511 次
性别:
来自: 上海

最近访客更多访客>>

u012363178

justsimple

cdphantom

wang_xuewu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2014-05 ( 22)
2014-04 ( 47)
2014-03 ( 25)
更多存档...

博客分类：

Algorithms -- Standford 学习笔记

Universal Hansh Open Address Load Factor Chaining

1. Definition : the load factor of a hash table is (# of objects in hash table/ # of buckets of hash table)

-- load factor = O(1) is necessary condition for operations to run in constant time.

-- with open addressing, need << 1.

-- good Hash Table performance need to control load.

2. For good HT performance, need a good hash function.

Ideal : user super-clever hash function guaranteed to spread every data set out evenly.

Problem : DOES NOT EXIST! (for every hash function, there is a Pathological Data Sets

Reason : fix a hash function h : U -> {0,1,2,…,n-1} , there exist bucket i such that at least

|u|/n elements of U hash to i under h. if data set drawn only from these, everything collides !

3. Solutions to pathological data sets

1) Use a cryptographic hash function (e.g. SHA-2)

-- infeasible to reverse engineer a pathological data set

2) Use randomization.

-- design a family H of hash functions such that for all data sets S, “almost all” functions spread S

out “pretty evenly”.

4. Universal Hash Functions Definition :

-- Let H be a set of hash functions from U to {0,1,2,…,n-1}

-- H is universal if and only if : for all x,y in U (with x ≠ y ) , P(h in H) { x , y collide} <= 1/n

(n = # of buckets) When h is chosen uniformly at random from H.

(i.e., collision probability as small as with “gold standard” of perfectly random hashing)

5. Example: Hashing IP Addresses

Let U = IP addresses of the form (x1,x2,x3,x4), with each xi in [0, ... , 255]

Let n = a prime (e.g., small multiple of # of objects in HT)

Definition : Define one hash function ha per 4-tuple a = (a1,a2,a3,a4) with each ai in [0, ..., n-1]

Define ha : IP addrs --> buckets by :

ha ( x1, x2, x3, x4) = (a1x1 + a2x2 + a3x3 + a4x4) mod n

Define H = { ha| a1, a2, a3, a4 in [0, ..., n] }

Proof : Consider distinct IP addresses (x1,x2,x3,x4), (y1,y2,y3,y4). Assume x4 ≠ y4

collision <==> a1x1 + a2x2 + a3x3 + a4x4 = a1y1 + a2y2 + a3y3 + a4y4 ( mod n)

<==> a4(x4-y4) = a1(y1-x1) + a2(y2-x2) + a3 (y3-x3) (mod n)

condition probability on random choice of a1, a2, a3. (a4 is still random)

x4 ≠ y4 (x4-y4 ≠ 0 mod n , assume n is bigger than any xi) n is prime, a4 uniform at random

6. Chaining: Constant-Time Guarantee

Claim : All operations run in O(1) time when hash table is implemented with chaining. Hash function h

chosen uniformly at random from universal family H.

Caveats: 1) in expectation over the random choice of the hash function h.

2) assumes |S| = O(n) (load factor is |S|/n = O(1) )

3) assumes takes O(1) time to evaluate hash function

Proof: analyze an unsuccessful Lookup (other operations only faster).

Let S = data set with |S| = O(n) Consider Lookup for x not in S :

Running Time : O(1) (compute h(x) ) + O(list length in A[h(x)] ) (traverse list , call it L)

For y in S (so, y ≠ x) define Zy = 1 if h(y) = h(x) , 0 otherwise. So L = Sum(y in S) { Zy}

So E[L] = E(Sum(y in S) {Zy}) = Sum (y in S) { E(Zy)} <= |S|/n = load factor = O(1)

7. Open Addresing Performance :

Heuristic assumption : (for a quick & dirty idealized analysis only) all n! probe sequences equally.

under heuristic assumption, expected Insertion time is ~ 1/(1-load)

Proof : A random probe finds an empty slot with 1- load

Insertion time ~ the number N of coin flips to get “heads”, where Pr[“heads”] = 1 - load

E[N] = 1 (1st flip) + load (probability of tail) * E[N]

So E[N] = 1/ (1-load)

With linear probing, heuristic assumtion is completely false.

Assume instead : initial probes uniform at random independent for different keys ( less false)

Theorem : [Knuth 1962] under above assumption, expected Insertion time is 1/(1-load)^2

分享到：

Bloom Filters | Hash Tables and Application

2013-03-19 17:01
浏览 1444
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Universal Hash

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Universal Hash

评论

发表评论

相关推荐

Bloom Filters

Hash Tables and Application

Balanced Search Tree

Heap

Dijkstra’s Algorithm

Introduction to Graph Search

Random Contraction Algorithm

Linear-Time Selection

Probability Review

QuickSort Analysis

Master Method

Divide and Conquer

Asymptotic Analysis

Introduction

Quick Sort

最近访客更多访客>>