I'm saving breeze SparseVectors to HBase using com.twitter.chill.KryoInjection for serialization to byte array which seemed to work fine. But then I recognized that after reading the vectors back out of HBase some values are different/missing. Now I'm wondering how HBase encodes data and where a mutation of the data could appear (saving/encoding/perhaps compressing data/reading??).
I wanted to compare the vectors stored in HBase with the correlating vectors right before saving to HBase to see if they are equal (then likely reading would be the problem), but I ran into the problem how to do this. The representation of a vector in HBase shell looks like
column=d:vector, timestamp=1431936909897, value=\x01\x00breeze.linalg.SparseVector$mcD$s\xF0\x01\x00\x01\x01breeze.collection.mutable.SparseArra\xF9\x01\x1A\x01\x02[\xC4\x01\x0 E?\xF0\x00\x00\x00\x00\x00\x00?\xC5-\xF2\x15\x85Z:?\xD6,{ci\xA8\x08@\x06P\xE3\x85\xACy'?\xEB\xA2\x09\xAA\xA3\xAD\x19?\xE4M\xCB\x98\xB8\x00f?\xE8\x00\x00\x00\x00\x00\x00@"\xA4Z\ x1C\xAC\x081?\xEB\xB0\xE3\xCD\x9AR&?\xE4\xB7\xF7K`\xDD)?\xEA\xD3\xC0\x06\x14\xEC\xF7?\xF3\x01]\xE8R46?\xC45\x03\x97\xE5\x0E\x8D\x0A\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0E\x02\ x0A0~\xB2\x01\xCC\x01\xBA\x02\xD22\xE4a\xDA\xB6\x0A\xD0\x8B&\xC0\xC0)\xDA\xCC\x05\x01\xC0\x84=\x01\x03breeze.storage.Zero$DoubleZero\xA4\x01\x01\x03\x06
How can I compare this to the "normal" byte code I get when serializing a vector to a text file? Did anyone already have a similar issue and can give advice?