1. Binary Codes:
-- Maps each character of an alphabet Sigma to a binary string.
-- Example: Sigma = a-z and various punctuation (size 32 overall, say)
Obvious encoding: Use the 32 5-bit binary strings to encode this (a xed-length code)
2. Variable Length Encoding :
-- if some characters of Sigma are much more frequent than others, using a variable-length code.
-- Problem: With variable-length codes, not clear where one character ends + the next one begins.
-- Solution: Prex-free codes -- make sure that for every pair i , j in Sigma, neither of the encodings f (i) , f (j) is a prex of the other.
3. Codes as Trees
-- Useful fact: Binary codes <==> Binary trees
4. Prex-Free Codes as Trees
-- Goal: Best binary prex-free encoding for a given set of character frequencies.
-- Left child edges <--> "0", right child edges <--> "1"
-- For each i in Sigma , exactly one node labeled "i"
-- Encoding of i in Sigma <--> Bits along path from node to the node "i"
-- Prex-free <--> Labelled nodes = the leaves [since prefixes <--> one node an ancestor of another]
-- To decode: Repeadetly follow path from root until you hit a leaf.
-- Encoding length of i in Sigma = depth of i in tree.
5. Problem Denition
-- Input: Probability pi for each character i in Sigma.
-- Notation: If T = tree with leaves <--> symbols of Sigma , then average encoding length
L(T) = Sum(i in Sigma){ pi * [depth of i in T] }
-- Output: A binary tree T minimizing the average encoding length L(T).
6. A Greedy Approach:
-- Build tree bottom-up using successive mergers.
-- Final encoding length of i in Sigma = # of mergers its subtree endures.
[Each merger increases encoding length of participating symbols by 1]
-- Greedy heuristic: In first iteration, merge the two symbols with the smallest frequencies. Replace symbols a , b by a new "meta-symbol" ab and the frequency Pab = Pa + Pb
-- Recusively apply the approach on less characters
7. Huffman's Algorithm:
-- If |Sigma| = 2 return A --> 0, B -->1
-- Otherwise, Let a , b in Sigma have the smallest frequencies.
-- Let Sigma' = Sigma with a , b replaced by new symbol ab.
-- Define Pab = Pa + Pb.
-- Recursively compute T' (for the alphabet Sigma')
-- Extend T' (with leaves <--> Sigma') to a tree T with leaves <--> Sigma by splitting leaf ab into two leaves a & b.
8. Correctness of Huffman's Algorithm:
-- By induction on n = |Sigma|. (Can assume n >= 2.)
-- Base case: When n = 2, algorithm outputs the optimal tree. (Needs 1 bit per symbol)
-- Inductive step: Fix input with n = |Sigma| > 2. By inductve hypothesis: Algorithm solves smaller subproblems (for Sigma') optimally. Let Sigma' = Sigma with a , b (symbols with smallest frequencies) replaced by meta-symbol ab. Define Pab = Pa + Pb.
-- For every such pair T and T', L(T) - L(T') = Pa [a's depth in T] +Pb [b's depth in T] -Pab [ab's depth in T'] = Pa(d +1)+Pb(d +1)-(Pa +Pb)d = Pa + Pb, Independent of T , T'!
-- The output of Huffman's Algorithm T for Sigma minimize L(T) for Sigma over all trees T' where a&b are siblings at the deepest level.
-- Intuition: Can make an optimal tree better by pushing a & b as deep as possible (since a , b have smallest frequencies).
-- By exchange argument. Let T* be any tree that minimizes L(T) for Sigma . Let x , y be siblings at the deepest level of T*. The exchange: Obtain T' from T* by swapping a <--> x, b <--> y (T' is the tree where a&b are siblings at the deepest level)
-- L(T*) - L(T') = (Px - Pa) * [x's depth in T* - a's depth in T*] + (Py - Pb) [y's depth in T* - b's depth in T*] >= 0
9. Running Time
-- Naive implementation: O(n2) time, where n = |Sigma|.
-- Speed ups:
-- Use a heap! [to perform repeated minimum computations]
-- Use keys = frequencies
-- After extracting the two smallest-frequency symbols, re-Insert the new meta-symbol [new key = sum of the 2 old ones]
=> Iterative, O(n log n) implementation.
-- Even faster: Sorting + O(n) additional work. -- manage (meta-)symbols using two queues.
相关推荐
As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the ...
pta题库答案c语言 pta题库答案c语言之树结构9HuffmanCodes
基于Huffman信源编码和LDPC信道编码的联合译码算法,梅中辉,吴乐南,在本文中,我们利用Huffman编码后的信源冗余,为LDPC码提出了一个联合译码算法。当Huffman码字数增加时,我们仅需要增大一个查询表,该
http://blog.csdn.net/xkzju2010/article/details/46359747
c++实现的哈夫曼树的构造和哈夫曼编码,有详细的注释。
The Huffman codes will be sorted in the same manner as the one above i.e. frequency, highest to lowest. Design your program to read the input from the input file "infile.dat". Your program must ...
赫夫曼树实现文件和字符串压缩及解压源码
利用哈夫曼编码进行通信可以大大提高信道利用率,缩短信息传输时间,降低传输成本。但是,这要求在发送端通过一个编码系统对待传输数据预先编码,在接收端将传来的数据进行译码(复原)。对于双工信道(即可以双向...
Given an arbitrary set of symbols (the english alphabet ... There are other interesting properties of Huffman codes as well as some gotchas with respect to their use, so the above link is worth reading.
5.7 Some Comments on Huffman Codes 5.8 Optimality of Huffman Codes 5.9 Shannon–Fano–Elias Coding 5.10 Competitive Optimality of the Shannon Code 5.11 Generation of Discrete Distributions from Fair ...
Fundamentals of Error-Correcting Codes Fundamentals of Error-Correcting Codes is an in-depth introduction to coding theory from both an engineering and mathematical viewpoint. As well as covering ...
16.3 Huffman codes 428 16.4 Matroids and greedy methods 437 16.5 A task-scheduling problem as a matroid 443 17 Amortized Analysis 451 17.1 Aggregate analysis 452 17.2 The accounting method 456 17.3 ...
16.3 Huffman codes 428 ? 16.4 Matroids and greedy methods 437 ? 16.5 A task-scheduling problem as a matroid 443 17 Amortized Analysis 451 17.1 Aggregate analysis 452 17.2 The accounting method 456 ...
5.1.1 Coding an Information Source 5.1.2 Some Desired Characteristics 5.1.3 Discrete Memoryless Sources 5.1.4 Extensions of a Discrete Memoryless Source 5.2 Huffman Codes ...
16.3 Huffman codes 428 16.4 Matroids and greedy methods 437 16.5 A task-scheduling problem as a matroid 443 17 Amortized Analysis 451 17.1 Aggregate analysis 452 17.2 The accounting method 456 17.3 ...
16.2 Elements of the greedy strategy 423 16.3 Huffman codes 428 415 ? 16.4 Matroids and greedy methods 437 ? 16.5 A task-scheduling problem as a matroid 443 17 Amortized Analysis 451 17.1 Aggregate ...
l6.3 Huffman codes 385 l6.4 Theoretical foundations for greedy methods 393 16.5 A task-scheduling problem 399 17 Amortized Analysis 405 l7.1 Aggregate analysis 406 17.2 The accounting method 410 17.3 ...
16.3 Huffman codes 428 16.4 Matroids and greedy methods 437 16.5 A task-scheduling problem as a matroid 443 17 Amortized Analysis 451 17.1 Aggregate analysis 452 17.2 The accounting method 456 17.3 ...
huffman 编码,解码,详细的描述了huffman编码解码的步骤和优化办法