6
votes

I would like to know what would be best for me: Kafka stream or Kafka consumer api or Kafka connect?

I want to read data from topic then do some processing and write to database. So I have written consumers but I feel I can write Kafka stream application and use it's stateful processor to perform any changes and write it to database which can eliminate my consumer code and only have to write db code.

Databases I want to insert my records are: HDFS - (insert raw JSON) MSSQL - (processed json)

Another option is Kafka connect but I have found there is no json support as of now for hdfs sink and jdbc sink connector.(I don't want to write in avro) and creating schema is also pain for complex nested messages.

Or should I write custom Kafka connect to do this.

So need you opinion on whether I should write Kafka consumer or Kafka stream or Kafka connect? And what will be better in terms of performance and have less overhead?

2

2 Answers

2
votes

You can use a combination of them all

I have tried HDFS sink for JSON but not able to use org.apache.kafka.connect.json.JsonConverter

Not clear why not. But I would assume you forgot to set schemas.enabled=false.

when I set org.apache.kafka.connect.storage.StringConverter it works but it writes the json object in string escaped format. For eg. {"name":"hello"} is written into hdfs as "{\"name\":\"hello\"}"

Yes, it will string-escape the JSON

Processing I want to do is basic validation and few field values transformation

Kafka Streams or Consumer API is capable of validation. Connect is capable of Simple Message Transforms (SMT)


Some use cases, you need to "duplicate data" onto Kafka; process your "raw" topic, read it using a consumer, then produce it back into a "cleaned" topic, from which you can use Kafka Connect to write to a database or filesystem.

-2
votes

Welcome to stack overflow! Please take the tout https://stackoverflow.com/tour

Please make posts with precise question, not asking for opinions - this makes the site clearer, and opinions are not answers (and subject to every person preferences). Asking "How to use Kafka-connect with json" - or so would fit this site.

Also, please show some research.


Less overhead would be kafka consumer - kafka stream and kafka connect use kafka consumer, so you will always be able to make less overhead, but will also lose all benefits (tolerant to failures, easy of usage, support, etc)

First, it depends of what your processing is. Aggregation? Counting? Validation? Then, you can use kafka streams to do the processing and write the result to a new topic, on the format you want.

Then, you can use kafka connect to send the data to your database. You are not forced to use avro, you can use other format for key/value, see

Kafka Connect HDFS Sink for JSON format using JsonConverter

Kafka Connect not outputting JSON