1
votes

I've been working on learning how to encode and decode GIF image files. Here is my test image:

Here is my test image:

Code with problem replaced with return, stops before error:

from enum import Enum

class Codes(Enum):
    CLEAR = 0
    EOI = 1

def decode(code_bytes: bytes):
    """Return decoded gif from incoming bytes, currently only works with entire bytes object"""
    lzw_min, leng, *_ = code_bytes
    total = 2 + leng

    # Skip lzw_min and sub-block size indicator bytes
    skips = {0, 1, total}
    while leng != 1:
        leng = code_bytes[total] + 1
        total += leng
        skips |= {total}

    def _stream(skips):
        """Return least significant bit still available from current byte"""
        for i, byte in enumerate(code_bytes):
            if i in skips:
                continue
            for _ in range(8):
                yield byte & 1 and 1
                byte >>= 1

    code_stream = _stream(skips)

    def get_code(bits, x=0, s=0):
        """Retrieve bits and assemble in proper order"""
        for n in range(s, bits):
            x |= next(code_stream) << n
        return x

    code_table = {
            **{i: [i] for i in range(2 ** lzw_min)},
            2 ** lzw_min: [Codes.CLEAR], 
            2 ** lzw_min + 1: [Codes.EOI]
    }
    bits = lzw_min + 1
    assert Codes.CLEAR in code_table[get_code(bits)]  # First code is always CLEAR
    table_index = init_table = len(code_table) - 1
    last = get_code(bits)
    code = get_code(bits)
    yield last
    while Codes.EOI not in code_table.get(code, []):

        if code <= table_index:
            for i in code_table[code]:
                yield i
            table_index += 1
            code_table[table_index] = code_table[last] + [code_table[code][0]]

        else:
            k = code_table[last][0]
            for i in code_table[last] + [k]:
                yield i
            table_index += 1
            code_table[table_index] = code_table[last] + [k]

        if table_index == 2**bits-1:
            bits += 1

        # Problem replaced here
        if bits == 13:
            return

        last, code = code, get_code(bits)

Output from above with the second frame of GIF:

enter image description here

Currently, output from the same input with problem included:

test image output, only the top half

So here is where the problem lies:

        if bits == 13:
            last, code = code, get_code(bits)
            bits = lzw_min + 1
            table_index = init_table
            assert Codes.CLEAR in code_table[code]

While reading the image, when table_index reaches one under the max bitlength and increments bits past the max bitlength, this code executes. It apparently isn't functioning properly.

After reaching max bit size, a CLEAR code is read next, as expected, no AssertionError is raised.

From what I've read, a CLEAR code means reset bitlength to lzw_min + 1 and re-initialize the code table. I believe that's what my code does.

Originally, code_table was a list and was reset after the CLEAR and bitlength reset, but codes much larger than the current table index kept being output from the stream. making code table a dict allows currently un-overwritten codes to be accessed after re-initializing, but this just produces noise. Apparently, some of the noise eventually registers as EOI and parsing ends.

What is wrong with what this algorithm does after encountering a maximum size code followed by a CLEAR?


Recreating my issue (requires pillow):

Here are the second frame of the test gif as a bytes literal you may use for function input and the frame's palette as a list literal for building the img:

bytes: (removed from pastebin)

palette:

[(230, 226, 225), (54, 28, 25), (99, 25, 28), (117, 22, 28), (65, 39, 33), (79, 45, 38), (92, 39, 36), (81, 45, 39), (79, 48, 40), (88, 50, 43), (100, 42, 38), (104, 60, 50), (111, 66, 55), (119, 68, 57), (138, 11, 23), (134, 14, 24), (139, 14, 25), (151, 9, 23), (148, 13, 26), (156, 12, 26), (132, 18, 27), (141, 18, 29), (149, 19, 30), (166, 7, 23), (173, 7, 24), (164, 11, 26), (172, 11, 27), (184, 4, 23), (181, 7, 25), (190, 6, 25), (180, 10, 27), (191, 9, 28), (166, 16, 30), (142, 22, 33), (142, 25, 35), (151, 23, 35), (146, 27, 37), (147, 30, 41), (189, 14, 33), (169, 23, 37), (180, 20, 37), (180, 23, 39), (188, 18, 36), (188, 26, 43), (149, 35, 45), (155, 36, 42), (151, 41, 51), (153, 43, 53), (134, 63, 56), (151, 48, 57), (156, 51, 60), (166, 40, 52), (173, 42, 51), (184, 39, 54), (167, 53, 55), (192, 7, 26), (192, 10, 29), (193, 14, 33), (194, 19, 37), (196, 26, 39), (195, 22, 40), (195, 27, 44), (196, 31, 48), (199, 34, 45), (197, 36, 52), (201, 42, 53), (198, 42, 58), (200, 45, 60), (202, 51, 60), (131, 75, 63), (175, 67, 61), (159, 55, 64), (157, 59, 67), (161, 59, 68), (170, 58, 69), (185, 55, 68), (202, 52, 67), (199, 56, 70), (204, 60, 75), (208, 62, 64), (148, 81, 71), (164, 68, 76), (171, 69, 76), (183, 70, 73), (189, 86, 75), (165, 75, 83), (169, 76, 84), (184, 75, 85), (170, 90, 83), (168, 84, 91), (172, 84, 91), (169, 99, 82), (177, 104, 86), (182, 106, 88), (172, 90, 97), (181, 92, 99), (173, 102, 108), (181, 100, 107), (183, 109, 114), (186, 118, 123), (198, 67, 72), (205, 65, 78), (206, 77, 73), (206, 69, 82), (202, 74, 87), (207, 73, 86), (208, 74, 87), (208, 77, 90), (199, 85, 90), (215, 91, 82), (209, 81, 94), (199, 104, 87), (218, 106, 92), (195, 114, 93), (203, 114, 92), (212, 112, 95), (201, 90, 100), (210, 85, 97), (211, 90, 101), (212, 94, 105), (199, 99, 108), (213, 98, 109), (202, 119, 99), (212, 122, 99), (215, 121, 100), (217, 125, 102), (201, 105, 114), (214, 102, 113), (215, 106, 117), (215, 109, 119), (216, 111, 120), (199, 116, 123), (212, 115, 124), (217, 115, 124), (225, 121, 103), (221, 130, 107), (226, 133, 110), (228, 135, 112), (230, 136, 113), (232, 138, 114), (224, 140, 121), (184, 125, 129), (198, 124, 130), (215, 122, 130), (219, 123, 132), (185, 133, 137), (189, 143, 146), (190, 149, 152), (191, 160, 162), (198, 132, 137), (215, 135, 141), (221, 131, 139), (201, 141, 145), (212, 141, 146), (222, 139, 146), (203, 150, 154), (214, 150, 154), (226, 153, 135), (224, 142, 149), (224, 144, 151), (225, 148, 155), (226, 153, 159), (224, 169, 156), (200, 158, 161), (214, 158, 162), (227, 156, 162), (200, 167, 169), (210, 162, 165), (216, 166, 169), (213, 170, 173), (201, 175, 177), (216, 174, 176), (205, 184, 185), (218, 179, 180), (215, 182, 184), (221, 187, 188), (228, 161, 166), (229, 165, 170), (230, 170, 174), (231, 173, 177), (232, 174, 178), (232, 178, 181), (227, 183, 185), (233, 182, 185), (234, 186, 189), (212, 191, 192), (228, 191, 193), (235, 190, 193), (207, 203, 202), (211, 195, 196), (212, 205, 204), (218, 201, 201), (212, 208, 207), (214, 210, 210), (216, 212, 212), (219, 216, 215), (222, 218, 217), (226, 195, 196), (236, 195, 197), (228, 199, 200), (237, 199, 200), (229, 203, 203), (238, 203, 204), (238, 207, 208), (240, 207, 208), (230, 213, 213), (235, 212, 212), (226, 221, 220), (237, 220, 219), (241, 211, 212), (241, 215, 216), (242, 219, 219), (228, 224, 223), (239, 224, 223), (242, 224, 223), (230, 226, 224), (234, 229, 228), (237, 232, 231), (239, 234, 232), (244, 228, 227), (245, 232, 231), (244, 237, 235), (248, 239, 237), (246, 241, 239), (248, 241, 239), (247, 242, 240), (248, 242, 240), (250, 248, 246), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0), (0, 0, 0)]

Here is the code needed to produce the same outputs as me with these literals:

from PIL import Image
w, h = 380, 407
gifo = Image.new('RGB', (w, h))
gif = gifo.load()
pix = decode(FRAME_2)  # bytes literal as input
palette = PALETTE      # palette literal as input
try:
    for y in range(h):
        for x in range(w):
            t = next(pix)
            while t in {Codes.EOI, Codes.CLEAR}: # Only needed to handle buggy noise with codes mixed into it
                    t = next(pix)
            gif[x, y] = palette[t]
except StopIteration:
    ...
gifo.show()

Resources I've used in this project:

http://giflib.sourceforge.net/whatsinagif/lzw_image_data.html

https://www.daubnet.com/en/file-format-gif

https://www.w3.org/Graphics/GIF/spec-gif89a.txt

https://www.eecis.udel.edu/~amer/CISC651/lzw.and.gif.explained.html

Regarding deferred CLEAR codes, though this gif does not contain these and I did not include logic for them in this question:

https://halicery.com/Image%20Decoders/GIF/GIF-LZW%20Deferred%20Clear%20Code.html

1
My decoder actually works fine for the "corrupted" gif in that question, gifs like that defer clear codes even after code table index 0xFFF has been used, but if you either stop adding to the code table or keep adding stuff it's fine, the codes will never actually be more then 0xFFF even if the code table is bigger than that. I actually linked something about this topic at the end of my question in the first place, as well as that first link. I have also read that some gifs, instead of deferring clear codes, assume you will reset the table at 0xFFF, I will handle that if I encounter it. - Ryan Laursen

1 Answers

2
votes

I figured it out. Among other things, I was pulling a 13 bit CLEAR code instead of 12 bit and I didn't cycle to the next data when initializing a table properly. This version works great.

BYTE = 8
MAX_BITLEN = 12

def decode(code_bytes):
    """Yield decoded gif-style LZW from incoming bytes."""
    bytes_iter = iter(code_bytes)
    lzw_min = next(bytes_iter)
    get_code = _get_code_getter(bytes_iter)
    code = clear = get_code(lzw_min + 1)
    eoi = clear + 1
    while code != eoi:
        if code == clear:
            bitlength = lzw_min + 1
            code_table = [[i] for i in range(eoi + 1)]
            last, code = get_code(bitlength), get_code(bitlength)
            yield last
        if code < len(code_table):
            output = code_table[code]
            to_add = code_table[last] + [code_table[code][0]]
        else:
            to_add = output = code_table[last] + [code_table[last][0]]
        for i in output:
            yield i
        code_table += [to_add]
        bitlength += len(code_table) == (bitlength < MAX_BITLEN) << bitlength
        last, code = code, get_code(bitlength)


def _get_code_getter(code_iter):
    def bit_stream():
        """Yield least significant bit remaining in current byte."""
        length = next(code_iter)
        for read, byte in enumerate(code_iter):
            if read == length:
                length += byte + 1
            else:
                for bit in (1 << b & byte and 1 for b in range(BYTE)):
                    yield bit

    def code_getter(bitlength):
        """Retrieve/structure bits, least significant first."""
        return sum(next(code_stream) << z for z in range(bitlength))

    code_stream = bit_stream()
    return code_getter