I've been working on learning how to encode and decode GIF image files. Here is my test image:
Code with problem replaced with return, stops before error:
from enum import Enum
class Codes(Enum):
CLEAR = 0
EOI = 1
def decode(code_bytes: bytes):
"""Return decoded gif from incoming bytes, currently only works with entire bytes object"""
lzw_min, leng, *_ = code_bytes
total = 2 + leng
# Skip lzw_min and sub-block size indicator bytes
skips = {0, 1, total}
while leng != 1:
leng = code_bytes[total] + 1
total += leng
skips |= {total}
def _stream(skips):
"""Return least significant bit still available from current byte"""
for i, byte in enumerate(code_bytes):
if i in skips:
continue
for _ in range(8):
yield byte & 1 and 1
byte >>= 1
code_stream = _stream(skips)
def get_code(bits, x=0, s=0):
"""Retrieve bits and assemble in proper order"""
for n in range(s, bits):
x |= next(code_stream) << n
return x
code_table = {
**{i: [i] for i in range(2 ** lzw_min)},
2 ** lzw_min: [Codes.CLEAR],
2 ** lzw_min + 1: [Codes.EOI]
}
bits = lzw_min + 1
assert Codes.CLEAR in code_table[get_code(bits)] # First code is always CLEAR
table_index = init_table = len(code_table) - 1
last = get_code(bits)
code = get_code(bits)
yield last
while Codes.EOI not in code_table.get(code, []):
if code <= table_index:
for i in code_table[code]:
yield i
table_index += 1
code_table[table_index] = code_table[last] + [code_table[code][0]]
else:
k = code_table[last][0]
for i in code_table[last] + [k]:
yield i
table_index += 1
code_table[table_index] = code_table[last] + [k]
if table_index == 2**bits-1:
bits += 1
# Problem replaced here
if bits == 13:
return
last, code = code, get_code(bits)
Output from above with the second frame of GIF:
Currently, output from the same input with problem included:
So here is where the problem lies:
if bits == 13:
last, code = code, get_code(bits)
bits = lzw_min + 1
table_index = init_table
assert Codes.CLEAR in code_table[code]
While reading the image, when table_index reaches one under the max bitlength and increments bits past the max bitlength, this code executes. It apparently isn't functioning properly.
After reaching max bit size, a CLEAR code is read next, as expected, no AssertionError is raised.
From what I've read, a CLEAR code means reset bitlength to lzw_min + 1 and re-initialize the code table. I believe that's what my code does.
Originally, code_table was a list and was reset after the CLEAR and bitlength reset, but codes much larger than the current table index kept being output from the stream. making code table a dict allows currently un-overwritten codes to be accessed after re-initializing, but this just produces noise. Apparently, some of the noise eventually registers as EOI and parsing ends.
What is wrong with what this algorithm does after encountering a maximum size code followed by a CLEAR?
Recreating my issue (requires pillow):
Here are the second frame of the test gif as a bytes literal you may use for function input and the frame's palette as a list literal for building the img:
bytes: (removed from pastebin)
palette:
[(230, 226, 225), (54, 28, 25), (99, 25, 28), (117, 22, 28), (65, 39, 33), (79, 45, 38), (92, 39, 36), (81, 45, 39), (79, 48, 40), (88, 50, 43), (100, 42, 38), (104, 60, 50), (111, 66, 55), (119, 68, 57), (138, 11, 23), (134, 14, 24), (139, 14, 25), (151, 9, 23), (148, 13, 26), (156, 12, 26), (132, 18, 27), (141, 18, 29), (149, 19, 30), (166, 7, 23), (173, 7, 24), (164, 11, 26), (172, 11, 27), (184, 4, 23), (181, 7, 25), (190, 6, 25), (180, 10, 27), (191, 9, 28), (166, 16, 30), (142, 22, 33), (142, 25, 35), (151, 23, 35), (146, 27, 37), (147, 30, 41), (189, 14, 33), (169, 23, 37), (180, 20, 37), (180, 23, 39), (188, 18, 36), (188, 26, 43), (149, 35, 45), (155, 36, 42), (151, 41, 51), (153, 43, 53), (134, 63, 56), (151, 48, 57), (156, 51, 60), (166, 40, 52), (173, 42, 51), (184, 39, 54), (167, 53, 55), (192, 7, 26), (192, 10, 29), (193, 14, 33), (194, 19, 37), (196, 26, 39), (195, 22, 40), (195, 27, 44), (196, 31, 48), (199, 34, 45), (197, 36, 52), (201, 42, 53), (198, 42, 58), (200, 45, 60), (202, 51, 60), (131, 75, 63), (175, 67, 61), (159, 55, 64), (157, 59, 67), (161, 59, 68), (170, 58, 69), (185, 55, 68), (202, 52, 67), (199, 56, 70), (204, 60, 75), (208, 62, 64), (148, 81, 71), (164, 68, 76), (171, 69, 76), (183, 70, 73), (189, 86, 75), (165, 75, 83), (169, 76, 84), (184, 75, 85), (170, 90, 83), (168, 84, 91), (172, 84, 91), (169, 99, 82), (177, 104, 86), (182, 106, 88), (172, 90, 97), (181, 92, 99), (173, 102, 108), (181, 100, 107), (183, 109, 114), (186, 118, 123), (198, 67, 72), (205, 65, 78), (206, 77, 73), (206, 69, 82), (202, 74, 87), (207, 73, 86), (208, 74, 87), (208, 77, 90), (199, 85, 90), (215, 91, 82), (209, 81, 94), (199, 104, 87), (218, 106, 92), (195, 114, 93), (203, 114, 92), (212, 112, 95), (201, 90, 100), (210, 85, 97), (211, 90, 101), (212, 94, 105), (199, 99, 108), (213, 98, 109), (202, 119, 99), (212, 122, 99), (215, 121, 100), (217, 125, 102), (201, 105, 114), (214, 102, 113), (215, 106, 117), (215, 109, 119), (216, 111, 120), (199, 116, 123), (212, 115, 124), (217, 115, 124), (225, 121, 103), (221, 130, 107), (226, 133, 110), (228, 135, 112), (230, 136, 113), (232, 138, 114), (224, 140, 121), (184, 125, 129), (198, 124, 130), (215, 122, 130), (219, 123, 132), (185, 133, 137), (189, 143, 146), (190, 149, 152), (191, 160, 162), (198, 132, 137), (215, 135, 141), (221, 131, 139), (201, 141, 145), (212, 141, 146), (222, 139, 146), (203, 150, 154), (214, 150, 154), (226, 153, 135), (224, 142, 149), (224, 144, 151), (225, 148, 155), (226, 153, 159), (224, 169, 156), (200, 158, 161), (214, 158, 162), (227, 156, 162), (200, 167, 169), (210, 162, 165), (216, 166, 169), (213, 170, 173), (201, 175, 177), (216, 174, 176), (205, 184, 185), (218, 179, 180), (215, 182, 184), (221, 187, 188), (228, 161, 166), (229, 165, 170), (230, 170, 174), (231, 173, 177), (232, 174, 178), (232, 178, 181), (227, 183, 185), (233, 182, 185), (234, 186, 189), (212, 191, 192), (228, 191, 193), (235, 190, 193), (207, 203, 202), (211, 195, 196), (212, 205, 204), (218, 201, 201), (212, 208, 207), (214, 210, 210), (216, 212, 212), (219, 216, 215), (222, 218, 217), (226, 195, 196), (236, 195, 197), (228, 199, 200), (237, 199, 200), (229, 203, 203), (238, 203, 204), (238, 207, 208), (240, 207, 208), (230, 213, 213), (235, 212, 212), (226, 221, 220), (237, 220, 219), (241, 211, 212), (241, 215, 216), (242, 219, 219), (228, 224, 223), (239, 224, 223), (242, 224, 223), (230, 226, 224), (234, 229, 228), (237, 232, 231), (239, 234, 232), (244, 228, 227), (245, 232, 231), (244, 237, 235), (248, 239, 237), (246, 241, 239), (248, 241, 239), (247, 242, 240), (248, 242, 240), (250, 248, 246), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0)]
Here is the code needed to produce the same outputs as me with these literals:
from PIL import Image
w, h = 380, 407
gifo = Image.new('RGB', (w, h))
gif = gifo.load()
pix = decode(FRAME_2) # bytes literal as input
palette = PALETTE # palette literal as input
try:
for y in range(h):
for x in range(w):
t = next(pix)
while t in {Codes.EOI, Codes.CLEAR}: # Only needed to handle buggy noise with codes mixed into it
t = next(pix)
gif[x, y] = palette[t]
except StopIteration:
...
gifo.show()
Resources I've used in this project:
http://giflib.sourceforge.net/whatsinagif/lzw_image_data.html
https://www.daubnet.com/en/file-format-gif
https://www.w3.org/Graphics/GIF/spec-gif89a.txt
https://www.eecis.udel.edu/~amer/CISC651/lzw.and.gif.explained.html
Regarding deferred CLEAR codes, though this gif does not contain these and I did not include logic for them in this question:
https://halicery.com/Image%20Decoders/GIF/GIF-LZW%20Deferred%20Clear%20Code.html



0xFFFhas been used, but if you either stop adding to the code table or keep adding stuff it's fine, the codes will never actually be more then0xFFFeven if the code table is bigger than that. I actually linked something about this topic at the end of my question in the first place, as well as that first link. I have also read that some gifs, instead of deferring clear codes, assume you will reset the table at0xFFF, I will handle that if I encounter it. - Ryan Laursen