I am implementing a LZW algorithm in C++.
The size of the dictionary is a user input, but the minimum is 256, so it should work with binary files. If it reaches the end of the dictionary it goes around to the index 0 and works up overwriting it from there.
For example, if i put in a alice in wonderland script and compress it with a dictionary size 512 i get this dictionary.
But i have a problem with decompression and the output dictionary from decompressing the compressed file looks like this.
And my code for decompressing looks like this
struct dictionary
{
vector<unsigned char> entry;
vector<bool> bits;
};
void decompress(dictionary dict[], vector<bool> file, int dictionarySize, int numberOfBits)
{
//in this example
//dictionarySize = 512, tells the max size of the dictionary, and goes back to 0 if it reaches 513
//numberOfBits = log2(512) = 9
//dictionary dict[] contains bits and strings (strings can be empty)
// dict[0] =
// entry = (unsigned char)0
// bits = (if numberOfBits = 9) 000000001
// dict[255] =
// entry = (unsigned char)255
// bits = (if numberOfBits = 9) 011111111
// so the next entry will be dict[next] (next is currently 256)
// dict[256] =
// entry = what gets added in the code below
// bits = 100000000
// all the bits are already set previously (dictionary size is int dictionarySize) so in this case all the bits from 0 to 511 are already set, entries are set from 0 to 255, so extended ASCII
vector<bool> currentCode;
vector<unsigned char> currentString;
vector<unsigned char> temp;
int next=256;
bool found=false;
for(int i=0;i<file.size();i+=numberOfBits)
{
for(int j=0;j<numberOfBits;j++)
{
currentCode.push_back(file[i+j]);
}
for(int j=0;j<dictionarySize;j++)
{
// when the currentCode (size numberOfBits) gets found in the dictionary
if(currentCode==dict[j].bits)
{
currentString = dict[j].entry;
// if the current string isnt empty, then it means it found the characted in the dictionary
if(!currentString.empty())
{
found = true;
}
}
}
//if the currentCode in the dictionary has a string value attached to it
if(found)
{
for(int j=0;j<currentString.size();j++)
{
cout<<currentString[j];
}
temp.push_back(currentString[0]);
// so it doesnt just push 1 character into the dictionary
// example, if first read character is 'r', it is already in the dictionary so it doesnt get added
if(temp.size()>1)
{
// if next is more than 511, writing to that index would cause an error, so it resets back to 0 and goes back up
if(next>dictionarySize-1) //next > 512-1
{
next = 0;
}
dict[next].entry.clear();
dict[next].entry = temp;
next++;
}
//temp = currentString;
}
else
{
currentString = temp;
currentString.push_back(temp[0]);
for(int j=0;j<currentString.size();j++)
{
cout<<currentString[j];
}
// if next is more than 511, writing to that index would cause an error, so it resets back to 0 and goes back up
if(next>dictionarySize-1)
{
next = 0;
}
dict[next].entry.clear();
dict[next].entry = currentString;
next++;
//break;
}
temp = currentString;
// currentCode gets cleared, and written into in the next iteration
currentCode.clear();
//cout<<endl;
found = false;
}
}
Im am currently stuck and dont know what to fix here to fix the output. I have also noticed, that if i put a dictionary big enough, so it doesnt go around the dictionary (it doesnt reach the end and begin again at 0) it works.
void decompress(dictionary dict[], vector<bool> file, int dictionarySize, int numberOfBits), here) b) what do the non-trivial conditions mean? In the particular code above, what shall happen to the contents oftempiftemp.size()<=1, and why? Then, there are interpretations where the size of a dictionary entry stays constant: 1) (code for prefix, suffix character) 2) (offset in plain text, length) - what you use now can grow as fast as the text. - greybeardClear codes? I did GIF encoder/decoder in the past which uses LZW and it uses clear codes which tells your decompresor the dictionary should be cleared. Otherwise the compression and decompression could get out of sync corrupting the decompressed data.... unless you add to dictionary at consistent predictable rate - Spektre