0
votes

I'm saving breeze SparseVectors to HBase using com.twitter.chill.KryoInjection for serialization to byte array which seemed to work fine. But then I recognized that after reading the vectors back out of HBase some values are different/missing. Now I'm wondering how HBase encodes data and where a mutation of the data could appear (saving/encoding/perhaps compressing data/reading??).

I wanted to compare the vectors stored in HBase with the correlating vectors right before saving to HBase to see if they are equal (then likely reading would be the problem), but I ran into the problem how to do this. The representation of a vector in HBase shell looks like

column=d:vector, timestamp=1431936909897, value=\x01\x00breeze.linalg.SparseVector$mcD$s\xF0\x01\x00\x01\x01breeze.collection.mutable.SparseArra\xF9\x01\x1A\x01\x02[\xC4\x01\x0 E?\xF0\x00\x00\x00\x00\x00\x00?\xC5-\xF2\x15\x85Z:?\xD6,{ci\xA8\x08@\x06P\xE3\x85\xACy'?\xEB\xA2\x09\xAA\xA3\xAD\x19?\xE4M\xCB\x98\xB8\x00f?\xE8\x00\x00\x00\x00\x00\x00@"\xA4Z\ x1C\xAC\x081?\xEB\xB0\xE3\xCD\x9AR&?\xE4\xB7\xF7K`\xDD)?\xEA\xD3\xC0\x06\x14\xEC\xF7?\xF3\x01]\xE8R46?\xC45\x03\x97\xE5\x0E\x8D\x0A\x00\x00\x00\x00\x00\x00\x00\x00\x01\x0E\x02\ x0A0~\xB2\x01\xCC\x01\xBA\x02\xD22\xE4a\xDA\xB6\x0A\xD0\x8B&\xC0\xC0)\xDA\xCC\x05\x01\xC0\x84=\x01\x03breeze.storage.Zero$DoubleZero\xA4\x01\x01\x03\x06

How can I compare this to the "normal" byte code I get when serializing a vector to a text file? Did anyone already have a similar issue and can give advice?

1

1 Answers

0
votes

HBase just stores data as an array of bytes that you gave it. It doesn't care if it has been created using kryo or any other technology. So the problem is likely in your code rather than in HBase.