Q: Is there any benefit of storing the length of a large array within the array itself?
Explanation:
Let's say we compress some large binary serialized object by using the GZipStream class of the System.IO.Compression namespace. The output will be a Base64 string of some compressed byte array. At some later point the Base64 string gets converted back to a byte array and the data needs to be decompressed.
While compressing the data we create a new byte array with the size of the compressed byte array + 4. In the first 4 bytes we store the length/size of the compressed byte array and we then BlockCopy the length and the data to the new array. This new array gets converted into a Base64 string.
While decompressing we convert the Base64 string into a byte array. Now we can extract the length of the actual compressed data by using the BitConverter class which will extract a Int32 from the first 4 bytes. We then allocate a byte array with the length that we got from the first 4 bytes and let the Stream write the decompressed bytes to the byte array.
I can't image that something like this actually has any benefit at all. It adds more complexity to the code and more operations need to be executed. Readability is reduced too. The BlockCopy operations alone should consume so much resources that this just cannot have a benefit, right?
Compression example code:
byte[] buffer = new byte[0xffff] // Some large binary serialized object
// Compress in-memory.
using (var mem = new MemoryStream())
{
// The actual compression takes place here.
using (var zipStream = new GZipStream(mem, CompressionMode.Compress, true)) {
zipStream.Write(buffer, 0, buffer.Length);
}
// Store compressed byte data here.
var compressedData = new byte[mem.Length];
mem.Position = 0;
mem.Read(compressedData, 0, compressedData.Length);
/* Increase the size by 4 to accommadate for an Int32 that
** will store the total length of the compressed data. */
var zipBuffer = new byte[compressedData.Length + 4];
// Store length of compressedData array in the first 4 bytes.
Buffer.BlockCopy(compressedData, 0, zipBuffer, 4, compressedData.Length);
// Store the compressedData array after the first 4 bytes which store the length.
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, zipBuffer, 0, 4);
return Convert.ToBase64String(zipBuffer);
}
Decompression example code:
byte[] zipBuffer = Convert.FromBase64String("some base64 string");
using (var inStream = new MemoryStream())
{
// The length of the array that was stored in the first 4 bytes.
int dataLength = BitConverter.ToInt32(zipBuffer, 0);
// Allocate array with specific size.
byte[] buffer = new byte[dataLength];
// Writer data to buffer array.
inStream.Write(zipBuffer, 4, zipBuffer.Length - 4);
inStream.Position = 0;
// Decompress data.
using (var zipStream = new GZipStream(inStream, CompressionMode.Decompress)) {
zipStream.Read(buffer, 0, buffer.Length);
}
... code
... code
... code
}
I can't image that something like this actually has any benefit at all.
It can benefit when streams of data are involved. You read the start of the stream - it tells you how many bytes to read until the end of this "block" of data (e.g. variable length string). I believe protobuf works like this, for example, when dealing withLength-delimited
so you know when this "block" of data ends. – mjwillsmem.Read(compressedData, 0, compressedData.Length);
This line of code worries me. It looks like you assume that if you ask it to read a certain number of bytes then it will do so. That is a dangerous (read: foolish) assumption when dealing with streams. You really should check the return value of that call. – mjwills