0
votes

I was looking for kafka-connect-hdfs connector (Confluent) support for saving byte array and Field partitioning using FlatBuffer schema.

I am recieving data in byte array from kafka. This byte array is generated from FlatBuffer. Need to save it in HDFS at path say Field1/Field2/Field3. These all fields need to be extracted from byte array using FlatBuffer schema. Also, data to be saved in HDFS need to be in bytes only. No conversion required for data.

I checked both:

  1. FieldPartitioner: https://github.com/confluentinc/kafka-connect-storage-common/blob/master/partitioner/src/main/java/io/confluent/connect/storage/partitioner/FieldPartitioner.java
  2. Supported formats: Json, Avro, Parquet. In https://github.com/confluentinc/kafka-connect-storage-cloud/blob/master/kafka-connect-s3/src/main/java/io/confluent/connect/s3/format/json/JsonRecordWriterProvider.java, though I find bytearray saved in HDFS if data is of type Kafka Struct.

I couldn't find a way to use them for my purpose.

Does anyone aware of such in built support. If not, then please guide me to resource (if any) to build custom support for both.

1

1 Answers

0
votes

FlatBuffers is not (currently) a supported serialization format, and the ByteArrayFormat is only available for S3 Connect, not HDFS, and just dumps out the ByteArraySerializer format from Kafka (which would be a Struct object after the converter, yes.

As for partitioning, since the data is only bytes, it does not inspect the record values in order to support partitioners, so you would need to add a custom one of those as well which would require deserialization of the message to inspect the fields.

I'm not sure why you linked to the S3 connect code, but if you want to add your own format, look at the PR that added StringFormat to HDFS connect


To build the project, look at the FAQ