0
votes

I'm making a code to encode the 10MB Binary Data using Reed-Solomon Code.

But, the module throws the error about message length like the following warning.

ValueError: Message is too long (10032003 when max is 255)

I've tried to understand the library's codes though, I couldn't understand the purpose of the codes.

Can you help me handle this problem?

This is the Reed-Solomon Module when I make the following codes.

The following codes is some parts of the code that I've made.

import time
import reedsolo as rs

def encoding(per, msg, n, nsym, gen):
    time = 0
    count = 0

    rs.init_tables(0x11d)
    while time < per:
        temp = time.time()
        rs.rs_encode_msg(msg, nsym, gen=gen[nsym])

        time += time.time() - temp
        count += 1

def main():
    data = b"<SOME TEXT>"*5500 #This data size is 10MB
    n = 8
    nsym = 3 # I wanted RS(8,3)
    period = 10


    gen = rs._rs_generator_poly_all(n)
    encoding(period, data, 8, 3, gen)
1
Have you read this guide: github.com/tomerfiliba/… ? "By default, most Reed-Solomon codecs are limited to characters that can be encoded in 256 bits and with a length of maximum 256 characters. But this codec is universal, you can reduce or increase the length and maximum character value by increasing the Galois Field:"Morten Jensen
Yes, you're right. But, if I increase the Galois Field, then the execution time increases drastically. In my case, the encoding should be completed in less than 0.5~1 sec about 10MB data. So, what I extend the Galois Field won't work for me.JIN
You don't normally encode 10MB in one go. It is more usual to frame/packetize the code, to say 256 bytes or something smaller than a kilobyte. Do you have a working understanding of Galois-Fields? I have forgotten many details, but I think 2^~10MB is a very large finite field!Morten Jensen
It is common to send data in a number of packets, check each packet for errors, maybe fix a bit-error or two, and then gather the data again from the packets on the receiving-side.Morten Jensen
@MortenJensen - for the more common Reed Solomon - BCH view, for GF(2^8) maximum length is 255.rcgldr

1 Answers

2
votes

I would suggest a method similar to jerasure as used for cloud storage. Treat the data as a matrix, 5 rows of data, 3 rows of ECC (RS(8,5) 8 total, 5 data, 3 parties), where each row has ncol = 10MB/5 (ncol is number of columns). Use RS code on each column of data independently. You might want to consider more rows, like 16 rows of data, 4 rows of ECC (RS(20,16)).

The error detection + correction process will need to convert columns into an array and call the RS library for encode and decode, and convert the results back to columns.

You'll need to use a compiled library as opposed to a python based library in order for the code to run fast enough. A compiled library for X86 can be sped up by using an assembly module that uses the PSHUFB (xmm registers) instruction to process 16 bytes at a time in parallel.

The github library linked to in the question includes a C source file that is meant to be compiled, but I don't know what tools are required to create a C compiled library for python.