Huffman Codes in Node - Compression Sample Implementation - JOYK Joy of Geek, Geek News, Link all geek

Huffman Codes Node

An implementation of Huffman Codes in Node.js

Usage

$ yarn install
$ node index encode <filename>
$ node index decode <filename>.huff

I've included some sample files to compress and decompress in the ./data directory.

Testing

$ yarn run test[:watch]

Special Considerations For Node.js

The main logic for the program is in ./huffman-tree.js

The huffman algorithm encodes the input as a string of bits. The script included takes a file as input and writes the encoded(compressed) result to a file as well. We cannot write individual bits to a file using node. It would also be unwise to write the sequence of 1's and 0's as characters to a file because each number would be represented by a byte on disk which would inflate the file size rather than compress it. We will have to use a different representation to store the encoded sequence.

The code included takes each sequence of 8 bits (a byte) and turns it into a decimal number. The number is then used to create an extended ascii character (0-255) using String.fromCharCode . When we later decode the sequence we can perform this process in reverse to arrive at the original string of bits.

Consider a case where the first 8 bits of the encoded string was 0011 1001

parseInt
String.fromCharCode

At the end of the sequence we make sure the left over bits become a byte by right padding with 0's. So if we had 3 bits left over we would add 5 0's to the end of the sequence. This will be handled accordingly when we decode.

When we decode the file we go in the opposite direction for each char (byte)

char to character code 9 -> 57
character code to binary 57 -> 111001
left pad until we have 8 bits -> 00111001
use the sequence to walk the huffman tree

Since we may have extra zeros from the right pad in the encoding we make sure we only decode the same number of characters we had in the input originally. Essentially we stop when we have the correct number of characters.

In order to reconstruct the same huffman tree when we decode the minified text we store the frequency table and length of the original input. We can parse this when we read the encoded file and then proceed with the decoding process described above.

File Structure

The output of the encode command is a file with two sections

The frequency table used to create the huffman tree written out as json in the first line
The output of the huffman algorithm encoded as described above

The first line is somewhat wasteful. If the input has many unique characters the JSON representation of the frequency table can become quite large. A better approach would be to find some way to encode the tree structure using less characters.

Huffman Codes in Node - Compression Sample Implementation

Huffman Codes Node

Usage

Testing

Special Considerations For Node.js

File Structure

Recommend

朴素贝叶斯(Naive Bayes)和情感分类

谈谈如何设计一个 Network->Model 组件

给理工男女的一个神奇网站！

如何合并两个 TensorFlow 模型

Janus WebRTC Server研究

Clojure.spec Beginner's FAQ

[译] 三种系统监控工具对比：top vs Htop vs Glances

William McKnight on Data Platforms and Creating a Modern Data Architecture

Restyling apps at scale

Running Linux containers as a non-root with Podman

About Joyk