3
votes

I'm storing many files of various lengths into a block-oriented medium (fixed-size e.g. 1024 bytes). When reading back the file, each block will either be missing or correct (no bit errors or the like). The missing blocks are random, and there's not necessarily any sequence to the missing blocks. I'd like to be able to reassemble the entire file, as long as the number of missing blocks is below some threshold, which probably varies with the encoding scheme.

Most of the literature I've seen deals with sequences of bit errors in a stream of data, so that wouldn't seem to apply.

A simple approach is to take N blocks at a time, then store a block containing the XOR of the N blocks. If one of the N blocks is missing but the check block is not, then the missing block can be reconstructed.

Are there error-correcting schemes which are well suited to this problem? Links to literature or code are appreciated.

3

3 Answers

1
votes

Your best place to begin your search is parchive Parity Volume Set spec. The biggest issue you will have is the overhead metadata needed in each block. Plus that spec is oriented toward compressed archive files.

Another good link is the parchive docs on the 2.0 format (based on but more block oriented than parchive 1.0). See QuickPar for a good breakdown on how 2.0 improved on PAR 1.0.

0
votes

Check out Raptor codes (https://en.wikipedia.org/wiki/Raptor_code) they're the most advanced fountain codes out there at the moment