`
leonzhx
  • 浏览: 794169 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论
阅读更多

1.  Bloom Filters: Supported Operations

     Fast Inserts and Lookups.

 

2.  Comparison to Hash Tables:

     Pros: more space efficient.

     Cons:

    1) can’t store an associated object

    2) No deletions

    3) Small false positive probability (i.e., might say x has been inserted even though it hasn’t been)

 

 

3.  Applications :

    Original: early spellcheckers. (insert valid words into BF)

    Canonical: list of forbidden passwords

    Modern: network routers - Limited memory, need to be super-fast

 

4.  Bloom Filter Implementation

    Ingredients: 1) array of n bits ( So n/|S| = # of bits per object in data set S)

 

                        2) k hash functions h1,…..,hk (k = small constant)

     Insert(x): for i = 1,2,…,k 

                       set A[h (x)]=1 (whether or not bit already set to 1)

     Lookup(x): return true if A[hi(x)] = 1 for every I = 1,2,….,k.

     Note: no false negatives. (if x was inserted, Lookup (x) guaranteed to succeed)

               But false positive if all k hi(x)’s already set to 1 by other insertions.

 

5.  Heuristic Analysis

     Intuition: should be a trade-off between space and error (false positive) probability.

     Assume: all hi(x)’s uniformly random and independent (across different i’s and x’s).

     Setup: n bits, insert data set S into bloom filter.

     Note: for each bit of A, the probability it’s been set to 1 is (under above assumption):

       1- (1- 1/n) ^ (|S|k) <= 1- e^ (-|S|k/n) = 1- e^(-k/b),  b is the number of bits used per object

       So under assumption, for x not in S, false positive probability is <=(1- e^(-k/b)) ^ k

        For fixed b, error rate is minimized by setting k to about (ln2)b 

        So the min error rate is about 1/2 ^((ln2) b) or b is about 1.44 log2 (1/min error rate)

     Example: with b = 8, choose k = 5 or 6 , error probability only approximately 2%. 

        

     

 

 

分享到:
评论

相关推荐

    Distance-Sensitive Bloom Filters

    ### Distance-Sensitive Bloom Filters #### 引言与背景 距离敏感型布隆过滤器(Distance-Sensitive Bloom Filters)是亚当·基尔希(Adam Kirsch)和迈克尔·米特森马赫(Michael Mitzenmacher)于2006年提出的一...

    ”Better Than Nothing” Privacy with Bloom Filters:To What Extent?.pdf

    Bloom filters are probabilistic data structures which permit to conve- niently represent set membership. Their performance/memory efficiency makes them appealing in a huge variety of scenarios. Their ...

    C network daemon for bloom filters.zip

    标题 "C network daemon for bloom filters.zip" 暗示了一个使用C语言编写的网络守护进程,该进程专门用于实现Bloom过滤器。Bloom过滤器是一种空间效率极高的概率数据结构,常用于判断一个元素是否可能在一个集合中...

    peloton_bloomfilters:高性能布隆过滤器

    Peloton的Bloomin快速Bloomfilters peloton_bloomfilter.SharedMemoryBloomfilter是peloton_bloomfilter.SharedMemoryBloomfilter最容易使用,最快的Bloomfilter实现。 用法 Bloomfilters是概率数据结构,支持基本...

    Compressed Bloom filters

    ### 压缩布隆过滤器(Compressed Bloom Filters):高效网络数据共享与查询优化 布隆过滤器(Bloom Filter)作为一种高效的空间优化随机数据结构,在支持成员查询方面表现卓越,尤其在数据集庞大时能显著节省存储...

    Deep Packet Inspection using Parallel Bloom Filters

    本文讨论了一种基于并行布隆过滤器(Parallel Bloom Filters)的深度包检测(Deep Packet Inspection, DPI)技术。该技术主要应用于高速网络环境下的内容识别、第7层交换以及互联网安全等领域。对于这些应用而言,...

    Go-bloom-Bloomfilters在Go中的实现

    Bloom Filter是一种空间效率极高的概率型数据结构,用于判断一个元素是否可能在一个集合中。在Go语言中,实现Bloom Filter可以利用其高效并发处理和内存管理的优势,使得这种数据结构在大数据场景下更加实用。本文将...

    Bloom filters in Python

    **布隆过滤器(Bloom Filter)在Python中的应用** 布隆过滤器是一种空间效率极高的概率型数据结构,用于判断一个元素是否在一个集合中。它可能会误判,但不会漏判,即可能存在假阳性(false positive),但不会有假...

    Cache, Hash and Space-Efficient Bloom Filters-计算机科学

    Cache-, Hash- and Space...We propose several new variants of Bloom filters and replacements with similar functionality. All of them have a better cache-efficiency and need less hash bits than regular Blo

    Multiple Bloom filters

    A standard technique from the cryptanalysis is to use exhaustive search that consists of systematically enumerating all possible candidates for ... Bloom filters can be used represent the password dicti

    布隆过滤器BloomFilters的一个简单Java库

    布隆过滤器(Bloom Filter)是一种空间效率极高的概率型数据结构,用于判断一个元素是否可能在一个集合中。在Java开发中,特别是在处理大数据、内存限制或需要快速查询是否存在某个元素的场景下,布隆过滤器是一个...

    Vectorized Bloom Filters for Advanced SIMD Processors - Columbia - Slides-计算机科学

    Vectorized Bloom Filters for Advanced SIMD ProcessorsOrestis Polychroniou! Kenneth A. RossBloom filters✤ Introduction! ✤ Original version [Bloom 1970]!✤ Represents a “set of items”!✤ Answers: ...

    Compressed Bloom Filters-计算机科学

    Compressed Bloom FiltersMichael Mitzenmacher ∗Harvard University 33 Oxford St.Cambridge, MA 02138michaelm@eecs.harvard.eduABSTRACT ...introduce compressed Bloom filters, which improve perfor- mance whe

    Vectorized Bloom Filters for Advanced SIMD Processors (damon14)-计算机科学

    Vectorized Bloom Filters for Advanced SIMD ProcessorsOrestis Polychroniou Columbia Universityorestis@cs.columbia.eduKenneth A. Ross∗ Columbia Universitykar@cs.columbia.eduABSTRACT Analytics are at ...

    libbloom.tar_bloom_

    这个库提供了API接口,使得开发者可以轻松地创建、操作和查询Bloom Filters。 综上所述,"libbloom.tar_bloom_"很可能是一个实现了Bloom Filter算法的库,可以帮助开发者在他们的程序中高效地实现数据存在性的快速...

    bloom filter布隆过滤器学习资料大全

    5. **Space-Efficient False Positive Correcting Bloom Filters**:优化的错误纠正布隆过滤器,通过额外的空间来纠正部分误判。 在实际应用中,需要根据具体需求选择合适的布隆过滤器变种,比如在数据库索引、垃圾...

Global site tag (gtag.js) - Google Analytics