In particular, you will gain practice with the following topics:
- Greedy Programming: practice with implementing a greedy approach to an important, real-world problem.
- Prefix Tries: a particular tree-like data structure meant to model prefixes along some path; in this case, the bits composing some Huffman Code.
- Compression: seeing, first-hand, how to compress some corpus of text into a representation that consumes fewer bits than its original.
Although less general than an arbitrary Huffman Encoder, the ability to create instances of our reusable variant saves the effort of having to reconstruct or transmit the Huffman Trie every time a new message is compressed or decompressed.
These two primary operations proceed as follows:
- Compression: finding the distribution of characters in the corpus, using these frequencies to find the Huffman Trie, after which we construct the Encoding Map that performs the compression.
- Decompression: given some bitstring (in this assignment, some sequence of bytes each 8 bits in length), decode the original corpus using a Huffman Trie.
With these details in place, let’s look at the specifications.
Simplifications and Assumptions
Before we detail the nitty-gritty, we should take a second to discuss some of the simplifying assumptions that we’ll make regarding your deliverable (since this is an assignment, not a thesis!):
- Decompressed text corpi will be
Stringsand their compressed format will simply be an array of
bytes. In general, you could write these byte to some file, but for testing purposes, we’ll make life easier on ourselves.
- All characters to encode / decode will be in basic, *lowercase* ASCII (don’t worry about uppercase letters).
- Since this Huffman decoder is reusable, there will be no need to pass messages with a header containing the bitstring representation of the Huffman Trie; all interactions (encoding or decoding) with the Huffman instance are assumed to be under the same / similar distribution of characters as in the original corpus on which the instance was constructed (see unit tests for more info).
- Corpi (corpuses? meh, too lazy too Google it and “corpi” sounds cooler) on which the Huffman Trie are constructed can have as few as 0 chars.
- Because we’ll be dealing with small corpi, the Huffman Trie nodes will encode counts (integers) for the proportions rather than probabilities (floats, as demonstrated in the notes).
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: firstname.lastname@example.org 微信:itcsdx