1. Binary Codes:
-- Maps each character of an alphabet Sigma to a binary string.
-- Example: Sigma = a-z and various punctuation (size 32 overall, say)
Obvious encoding: Use the 32 5-bit binary strings to encode this (a xed-length code)
2. Variable Length Encoding :
-- if some characters of Sigma are much more frequent than others, using a variable-length code.
-- Problem: With variable-length codes, not clear where one character ends + the next one begins.
-- Solution: Prex-free codes -- make sure that for every pair i , j in Sigma, neither of the encodings f (i) , f (j) is a prex of the other.
3. Codes as Trees
-- Useful fact: Binary codes <==> Binary trees
4. Prex-Free Codes as Trees
-- Goal: Best binary prex-free encoding for a given set of character frequencies.
-- Left child edges <--> "0", right child edges <--> "1"
-- For each i in Sigma , exactly one node labeled "i"
-- Encoding of i in Sigma <--> Bits along path from node to the node "i"
-- Prex-free <--> Labelled nodes = the leaves [since prefixes <--> one node an ancestor of another]
-- To decode: Repeadetly follow path from root until you hit a leaf.
-- Encoding length of i in Sigma = depth of i in tree.
5. Problem Denition
-- Input: Probability pi for each character i in Sigma.
-- Notation: If T = tree with leaves <--> symbols of Sigma , then average encoding length
L(T) = Sum(i in Sigma){ pi * [depth of i in T] }
-- Output: A binary tree T minimizing the average encoding length L(T).
6. A Greedy Approach:
-- Build tree bottom-up using successive mergers.
-- Final encoding length of i in Sigma = # of mergers its subtree endures.
[Each merger increases encoding length of participating symbols by 1]
-- Greedy heuristic: In first iteration, merge the two symbols with the smallest frequencies. Replace symbols a , b by a new "meta-symbol" ab and the frequency Pab = Pa + Pb
-- Recusively apply the approach on less characters
7. Huffman's Algorithm:
-- If |Sigma| = 2 return A --> 0, B -->1
-- Otherwise, Let a , b in Sigma have the smallest frequencies.
-- Let Sigma' = Sigma with a , b replaced by new symbol ab.
-- Define Pab = Pa + Pb.
-- Recursively compute T' (for the alphabet Sigma')
-- Extend T' (with leaves <--> Sigma') to a tree T with leaves <--> Sigma by splitting leaf ab into two leaves a & b.
8. Correctness of Huffman's Algorithm:
-- By induction on n = |Sigma|. (Can assume n >= 2.)
-- Base case: When n = 2, algorithm outputs the optimal tree. (Needs 1 bit per symbol)
-- Inductive step: Fix input with n = |Sigma| > 2. By inductve hypothesis: Algorithm solves smaller subproblems (for Sigma') optimally. Let Sigma' = Sigma with a , b (symbols with smallest frequencies) replaced by meta-symbol ab. Define Pab = Pa + Pb.
-- For every such pair T and T', L(T) - L(T') = Pa [a's depth in T] +Pb [b's depth in T] -Pab [ab's depth in T'] = Pa(d +1)+Pb(d +1)-(Pa +Pb)d = Pa + Pb, Independent of T , T'!
-- The output of Huffman's Algorithm T for Sigma minimize L(T) for Sigma over all trees T' where a&b are siblings at the deepest level.
-- Intuition: Can make an optimal tree better by pushing a & b as deep as possible (since a , b have smallest frequencies).
-- By exchange argument. Let T* be any tree that minimizes L(T) for Sigma . Let x , y be siblings at the deepest level of T*. The exchange: Obtain T' from T* by swapping a <--> x, b <--> y (T' is the tree where a&b are siblings at the deepest level)
-- L(T*) - L(T') = (Px - Pa) * [x's depth in T* - a's depth in T*] + (Py - Pb) [y's depth in T* - b's depth in T*] >= 0
9. Running Time
-- Naive implementation: O(n2) time, where n = |Sigma|.
-- Speed ups:
-- Use a heap! [to perform repeated minimum computations]
-- Use keys = frequencies
-- After extracting the two smallest-frequency symbols, re-Insert the new meta-symbol [new key = sum of the 2 old ones]
=> Iterative, O(n log n) implementation.
-- Even faster: Sorting + O(n) additional work. -- manage (meta-)symbols using two queues.
相关推荐
As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the ...
在本压缩包“pta题库答案c语言之树结构9HuffmanCodes.zip”中,主要包含的是关于C语言编程及树结构的知识,特别是Huffman编码的相关解答。Huffman编码是一种用于数据压缩的算法,它利用了字符出现频率的不同来创建更...
Huffman Codes (30)"中,可能包含了一些关于如何构建和使用哈夫曼编码的详细教程或实例,比如通过编程语言实现哈夫曼编码的代码示例,或者是对特定文本的哈夫曼编码分析。通过学习和理解这些内容,可以更深入地掌握...
c++实现的哈夫曼树的构造和哈夫曼编码,有详细的注释。
The Huffman codes will be sorted in the same manner as the one above i.e. frequency, highest to lowest. Design your program to read the input from the input file "infile.dat". Your program must ...
赫夫曼树实现文件和字符串压缩及解压源码
std::unordered_map, std::string> huffmanCodes; void generateHuffmanCodes(HuffmanNode* node, std::string code) { if (node->charValue != '\0') { huffmanCodes[node->charValue] = code; } if (node->...
- `compress(input, huffmanCodes)`:对输入的文本进行编码,替换字符为对应的哈夫曼编码。 - `decompress(compressedData, huffmanTree)`:根据压缩后的数据和哈夫曼树结构进行解压缩。 通过理解并分析这个源代码...
function huffmanCodes = generateHuffmanCodes(huffmanTree) % ... 生成哈弗曼编码的代码 ... end ``` 4. **图像编码**: 使用生成的哈弗曼编码将图像像素值转换为二进制串。 ```matlab function encodedImage ...
void generateCodes(HuffmanNode* root, std::string code, std::map, std::string>& huffmanCodes) { if (root == nullptr) return; if (root->data != '\0') { huffmanCodes[root->data] = code; return; ...
public static void generateCodes(Node root, String str, Map, String> huffmanCodes) { if (root.left == null && root.right == null) { // 如果是叶子节点 huffmanCodes.put(root.data, str); } else { ...
void generateCodes(HuffmanNode node, String code, Map, String> huffmanCodes) { if (node == null) return; if (node.ch != '\0') { huffmanCodes.put(node.ch, code); } generateCodes(node.left, code +...
* 最优解决方案:change making、minimum spanning tree、single-source shortest paths、simple scheduling problems和Huffman codes等。 * 近似解决方案:traveling salesman problem、knapsack problem和其他组合...
Given an arbitrary set of symbols (the english alphabet ... There are other interesting properties of Huffman codes as well as some gotchas with respect to their use, so the above link is worth reading.
利用哈夫曼编码进行通信可以大大提高信道利用率,缩短信息传输时间,降低传输成本。但是,这要求在发送端通过一个编码系统对待传输数据预先编码,在接收端将传来的数据进行译码(复原)。对于双工信道(即可以双向...
课程回答题目集中部分问题解决方法分析02- 线性结构3 反向链表03-树2 列表叶子03-树3树遍历再次最优解法——实时输出法分析04-树6 完全二叉搜索树 完全二叉搜索树 中序遍历法05-树9 Huffman Codes 问题分析及建树...
在"哈夫曼编码译码代码,有注释且能直接运行"这个项目中,`HuffmanCodes.java`文件很可能是包含完整实现的源代码。下面将详细解释哈夫曼编码的关键步骤和相关知识点: 1. **哈夫曼树构建**: - **构造哈夫曼树**:...
5.7 Some Comments on Huffman Codes 5.8 Optimality of Huffman Codes 5.9 Shannon–Fano–Elias Coding 5.10 Competitive Optimality of the Shannon Code 5.11 Generation of Discrete Distributions from Fair ...