I am new to Kafka and am trying to store messages with the least memory overhead, so want to avoid field names in my encoding (ie. JSON). Consider a message with three variable length String
fields,
Interface IMessage:
String getA()
String getB()
String getC()
Since Kafka includes a default String Serializer, the easiest way to encode would be to simply concatenate and delimit the fields. Something like,
String encoded = "FieldA|FieldB|FieldC"
Under the hood, Kafka will convert this to a byte array.
My question is, will kafka use Java's default UTF-8 encoding such that each ASCII character in my string only take up one byte? In other words, will a 15 character string take up 15 bytes in Kafka's memory? Or is it more efficient for some reason to call toBytes()
in Java and pass the bytearray directly into ByteArraySerializer?
byte[] encoded = "FieldA|FieldB|FieldC".toBytes()