0
votes

Alright so I am trying to do a file compress using the Huffman tree.

We got the tree that is working just fine but we are unable to figure out how to write the binary string we get into the file.

So for example our tree returns: '110', it should mean this byte: '00000110' right?

And if the returns: '11111111 11111110' it should mean what? Should we just write it in in byte?

So the question is how do we convert the binary string we get into bytes so we can write it on the file?

Thanks alot, Ara

1
possible duplicate of Binary to text in Java - Smutje
@Smutje Not really. The other question is how to pass from binary string to readeable text. I am asking how to write my binary string as bytes in a file text. - Ara Sivaneswaran
So you know that you need to write a String to a text file but your are not able to search for yourself? Then SO is clearly not the right place to ask. - Smutje
@Smutje I have a binary string: '110'. I need to transform it into bytes and then write the bytes into a file so the file will be compressed compared to the original file. It's the part of transforming the binary string into bytes that I don't understand. - Ara Sivaneswaran
Then guess what the SO search turns out for "java binary string byte": stackoverflow.com/questions/17727310/… - Smutje

1 Answers

1
votes

So for example our tree returns: '110', it should mean this byte: '00000110' right?

Wrong. You should have a byte buffer of bits into which you write your bits. Write the three bits 110 into the byte. (You will need to decide on a convention for bit ordering in the byte.) You still have five unused bits in the byte, so there it sits. Now you write 10 into the buffer. The byte buffer now has 11010, and three unused bits. So still it sits. Now you try to write 111011 into the byte buffer. The first three bits go into the byte buffer, giving you 11010111. You now have filled the buffer, so only now do you write out your byte to the file. You are left with 011. You clear your byte buffer of bits since you wrote it out, and put in the remaining 011 from your last code. Your byte buffer now has three bits in it, and five bits unused. Continue in this manner.

The buffer does not have to be one byte. 16-bit or 32-bit buffers are common and are more efficient. You write out bytes whenever the bits therein are eight or more, and shift the remaining 0-7 bits to the start of the buffer.

The only tricky part is what to do at the end, since you may have unused bits in your last byte. Your Huffman codes should have an end symbol to mark the end of the stream. Then you know when you should stop looking for more Huffman codes. If you do not have an end code, then you need to assure somehow that either the remaining bits in the byte cannot be a complete Huffman code, or you need to indicate in some other way where the stream of bits end.