We need to export production data from a Kafka topic to use it for testing purposes: the data is written in Avro and the schema is placed on the Schema registry.
We tried the following strategies:
- Using
kafka-console-consumer
withStringDeserializer
orBinaryDeserializer
. We were unable to obtain a file which we could parse in Java: we always got exceptions when parsing it, suggesting the file was in the wrong format. - Using
kafka-avro-console-consumer
: it generates a json which includes also some bytes, for example when deserializing BigDecimal. We didn't even know which parsing option to choose (it is not avro, it is not json)
Other unsuitable strategies:
- deploying a special kafka consumer would require us to package and place that code in some production server, since we are talking about our production cluster. It is just too long. After all, isn't kafka console consumer already a consumer with configurable options?
Potentially suitable strategies
- Using a kafka connect Sink. We didn't find a simple way to reset the consumer offset since apparently the connector created consumer is still active even when we delete the sink
Isn't there a simply, easy way to dump the content of the value (not the schema) of a Kafka topic containing avro data to a file so that it can be parsed? I expect this to be achievable using kafka-console-consumer with the right options, plus using the correct Java Api of Avro.