How to implement Huffman code compression and decompression in C#

Question

I have an assignment on Huffman Coding to compresses and decompresses text document using Huffman code. I have created a Huffman tree where each Node contains BitArray with calculated Huffman code.

Main problem is encoding input file efficiently. I am not sure how to save encoded bytes(created by converting BitArray to byte array) without making collision between codes?

Example: There are two Huffman codes e = 101 and i = 0101. When they are converted to bytes they are represented as e = 00000101 and i = 00000101.

How can i avoid this and is there a better way to encode file?

What is the expected time for compressing and decompressing a file with 1 milion characters?

(For now i am creating a BitArray that contains all encoded bits and then convert it to byte[] and save it, witch is takes too much time and memory.)

You should not pad your codes to fill up bytes. Instead, you should think about concatenating all the codes (without padding) to form a long string of bits, then output eight bits at a time into your file. — Michael Liu

Mark Adler Mark Adler · Accepted Answer · 2018-04-29T15:52:18

Consider your output to be a string of bits, not bytes. You concatenate your codes, each with an arbitrary number of bits, 3, 4, 15, whatever, using an integer of, say, 32 bits as a bit buffer. As you accumulate more than 8 bits, you output a byte and remove that from your buffer. At the end if you have less than 8 bits left, you pad the rest with zero bits and write out the last byte. You do this with shift and or operations to manipulate the bits in the bit buffer.

How to implement Huffman code compression and decompression in C#

1 Answers