Huffman Codes

leonzhx

浏览: 798403 次
性别:
来自: 上海

最近访客更多访客>>

u012363178

justsimple

cdphantom

wang_xuewu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2014-05 ( 22)
2014-04 ( 47)
2014-03 ( 25)
更多存档...

博客分类：

Algorithms II -- Standford 学习笔记

Huffman Binary Codes Greedy Algorithm Binary Trees

1. Binary Codes:

-- Maps each character of an alphabet Sigma to a binary string.

-- Example: Sigma = a-z and various punctuation (size 32 overall, say)

Obvious encoding: Use the 32 5-bit binary strings to encode this (a xed-length code)

2. Variable Length Encoding :

-- if some characters of Sigma are much more frequent than others, using a variable-length code.

-- Problem: With variable-length codes, not clear where one character ends + the next one begins.

-- Solution: Prex-free codes -- make sure that for every pair i , j in Sigma, neither of the encodings f (i) , f (j) is a prex of the other.

3. Codes as Trees

-- Useful fact: Binary codes <==> Binary trees

4. Prex-Free Codes as Trees

-- Goal: Best binary prex-free encoding for a given set of character frequencies.

-- Left child edges <--> "0", right child edges <--> "1"

-- For each i in Sigma , exactly one node labeled "i"

-- Encoding of i in Sigma <--> Bits along path from node to the node "i"

-- Prex-free <--> Labelled nodes = the leaves [since prefixes <--> one node an ancestor of another]

-- To decode: Repeadetly follow path from root until you hit a leaf.

-- Encoding length of i in Sigma = depth of i in tree.

5. Problem Denition

-- Input: Probability pi for each character i in Sigma.

-- Notation: If T = tree with leaves <--> symbols of Sigma , then average encoding length

L(T) = Sum(i in Sigma){ pi * [depth of i in T] }

-- Output: A binary tree T minimizing the average encoding length L(T).

6. A Greedy Approach:

-- Build tree bottom-up using successive mergers.

-- Final encoding length of i in Sigma = # of mergers its subtree endures.

[Each merger increases encoding length of participating symbols by 1]

-- Greedy heuristic: In first iteration, merge the two symbols with the smallest frequencies. Replace symbols a , b by a new "meta-symbol" ab and the frequency Pab = Pa + Pb

-- Recusively apply the approach on less characters

7. Huffman's Algorithm:

-- If |Sigma| = 2 return A --> 0, B -->1

-- Otherwise, Let a , b in Sigma have the smallest frequencies.

-- Let Sigma' = Sigma with a , b replaced by new symbol ab.

-- Define Pab = Pa + Pb.

-- Recursively compute T' (for the alphabet Sigma')

-- Extend T' (with leaves <--> Sigma') to a tree T with leaves <--> Sigma by splitting leaf ab into two leaves a & b.

8. Correctness of Huffman's Algorithm:

-- By induction on n = |Sigma|. (Can assume n >= 2.)

-- Base case: When n = 2, algorithm outputs the optimal tree. (Needs 1 bit per symbol)

-- Inductive step: Fix input with n = |Sigma| > 2. By inductve hypothesis: Algorithm solves smaller subproblems (for Sigma') optimally. Let Sigma' = Sigma with a , b (symbols with smallest frequencies) replaced by meta-symbol ab. Define Pab = Pa + Pb.

-- For every such pair T and T', L(T) - L(T') = Pa [a's depth in T] +Pb [b's depth in T] -Pab [ab's depth in T'] = Pa(d +1)+Pb(d +1)-(Pa +Pb)d = Pa + Pb, Independent of T , T'!

-- The output of Huffman's Algorithm T for Sigma minimize L(T) for Sigma over all trees T' where a&b are siblings at the deepest level.

-- Intuition: Can make an optimal tree better by pushing a & b as deep as possible (since a , b have smallest frequencies).

-- By exchange argument. Let T* be any tree that minimizes L(T) for Sigma . Let x , y be siblings at the deepest level of T*. The exchange: Obtain T' from T* by swapping a <--> x, b <--> y (T' is the tree where a&b are siblings at the deepest level)

-- L(T*) - L(T') = (Px - Pa) * [x's depth in T* - a's depth in T*] + (Py - Pb) [y's depth in T* - b's depth in T*] >= 0