Bloom Filters

leonzhx

浏览: 812637 次
性别:
来自: 上海

最近访客更多访客>>

u012363178

justsimple

cdphantom

wang_xuewu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2014-05 ( 22)
2014-04 ( 47)
2014-03 ( 25)
更多存档...

博客分类：

Algorithms -- Standford 学习笔记

Bloom Filters Hash Table false positive error rate

1. Bloom Filters: Supported Operations

Fast Inserts and Lookups.

2. Comparison to Hash Tables:

Pros: more space efficient.

Cons:

1) can’t store an associated object

2) No deletions

3) Small false positive probability (i.e., might say x has been inserted even though it hasn’t been)

3. Applications :

Original: early spellcheckers. (insert valid words into BF)

Canonical: list of forbidden passwords

Modern: network routers - Limited memory, need to be super-fast

4. Bloom Filter Implementation

Ingredients: 1) array of n bits ( So n/|S| = # of bits per object in data set S)

2) k hash functions h1,…..,hk (k = small constant)

Insert(x): for i = 1,2,…,k

set A[h (x)]=1 (whether or not bit already set to 1)

Lookup(x): return true if A[hi(x)] = 1 for every I = 1,2,….,k.

Note: no false negatives. (if x was inserted, Lookup (x) guaranteed to succeed)

But false positive if all k hi(x)’s already set to 1 by other insertions.

5. Heuristic Analysis

Intuition: should be a trade-off between space and error (false positive) probability.

Assume: all hi(x)’s uniformly random and independent (across different i’s and x’s).

Setup: n bits, insert data set S into bloom filter.

Note: for each bit of A, the probability it’s been set to 1 is (under above assumption):

1- (1- 1/n) ^ (|S|k) <= 1- e^ (-|S|k/n) = 1- e^(-k/b), b is the number of bits used per object

So under assumption, for x not in S, false positive probability is <=(1- e^(-k/b)) ^ k

For fixed b, error rate is minimized by setting k to about (ln2)b

So the min error rate is about 1/2 ^((ln2) b) or b is about 1.44 log2 (1/min error rate)

Example: with b = 8, choose k = 5 or 6 , error probability only approximately 2%.

分享到：

《高效能人士的7个习惯》课程分享 | Universal Hash

2013-03-19 18:03
浏览 898
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Bloom Filters

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Bloom Filters

评论

发表评论

相关推荐

Universal Hash

Hash Tables and Application

Balanced Search Tree

Heap

Dijkstra’s Algorithm

Introduction to Graph Search

Random Contraction Algorithm

Linear-Time Selection

Probability Review

QuickSort Analysis

Master Method

Divide and Conquer

Asymptotic Analysis

Introduction

Quick Sort

最近访客更多访客>>