0
votes

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also then encodes and compresses an output file and currently has some functionality in decompression and decoding.

In summary, my program takes a input file compresses and encodes an output file, closes the output file and opens the encoding as an input file, and takes a new output file that is supposed to have a decoded message identical to the original text input file.

My problem is that in my test run while compressing I notice that I have 3 extra bytes and in turn when I decompress and decode my encoded file, these 3 extra bytes are being decoded to my output file. Depending on the amount of text in the original input file, my other tests output these extra bytes.

My research has let me to a few suggestions such as making the first 8 bytes of your encoded output file the 64 bits of an unsigned long long that give the number of bytes in the file or using a psuedo-EOF but I am stuck on how I would go about handling it and which of the two is a smart way to handle it given the code I have already written or if either is a smart way at all?

Any guidance or solution to this problem is appreciated.

(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)

(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)

code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.

freq[256] holds the frequency of each ascii character read or holds 0 if it is not in original input file.

void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
    ifstream ifile; //to read file
    ifile.open(fileName, ios::binary);
    if (!ifile)//to check if file is open or not
    {
        die("Can't read again"); // function that exits program if can't open
    }
    ofstream ofile;
    ofile.open(fileName2, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    int read; 
    read = ifile.get(); //read one char from file and store it in int
    char buffer = 0, bit_count = 0;
    while (read != -1) {//run this loop until reached to end of file(-1)
        for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
            buffer <<= 1;
            buffer |= code[read][b] != '0';
            bit_count++;
            if (bit_count == 8) {
                ofile << buffer;
                buffer = 0;
                bit_count = 0;
            }
        }
        read = ifile.get();
    }

    if (bit_count != 0)
        ofile << (buffer << (8 - bit_count));

    ifile.close();
    ofile.close();
}

void decodeOutput(const string & fileName2, const string & fileName3, string code[256], const unsigned long long freq[256]) {
    ifstream ifile;
    ifile.open(fileName2, ios::binary);
    if (!ifile)
    {
        die("Can't read again");
    }
    ofstream ofile;
    ofile.open(fileName3, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    priority_queue < node > q;
    for (unsigned i = 0; i < 256; i++) {
        if (freq[i] == 0) {
            code[i] = "";
        }
    }

    for (unsigned i = 0; i < 256; i++)
        if (freq[i])
            q.push(node(unsigned(i), freq[i]));

    if (q.size() < 1) {
        die("no data");
    }

    while (q.size() > 1) {
        node *child0 = new node(q.top());
        q.pop();
        node *child1 = new node(q.top());
        q.pop();
        q.push(node(child0, child1));
    } // created the tree
    string answer = "";
    const node * temp = &q.top(); // root 
    for (int c; (c = ifile.get()) != EOF;) {
        for (unsigned p = 8; p--;) { //reading 8 bits at a time 
            if ((c >> p & 1) == '0') { // if bit is a 0
                temp = temp->child0; // go left
            }
            else { // if bit is a 1
                temp = temp->child1; // go right
            }
            if (temp->child0 == NULL && temp->child1 == NULL) // leaf node
            {
                answer += temp->value;
                temp = &q.top();
            }
        }
    }
  ofile << ans;
}
1

1 Answers

2
votes

Because of integral promotion rules, (buffer << (8 - bit_count)) will be an integer expression, causing 4 bytes to be written. To only write one byte, you need to cast this to a char.

ofile << char(buffer << (8 - bit_count));