Best practice for Spark to read kafka streams with schema registry (avro)?

Question

Is there any best practice for Spark to process kafka stream which is serialized in Avro with schema registry? Especially for Spark Structured Streams?

I have found an example at https://github.com/ScalaConsultants/spark-kafka-avro/blob/master/src/main/scala/io/scalac/spark/AvroConsumer.scala . But I have failed to load AvroConverter class. I cannot find artifact named io.confluent:kafka-avro-serializer in mvnrepository.com.

Czoo Czoo · Accepted Answer · 2018-07-24T13:54:46

You need to add the Confluent repo in your build.sbt:

val repositories = Seq(
  "confluent" at "http://packages.confluent.io/maven/",
  Resolver.sonatypeRepo("public")
)

See: https://github.com/ScalaConsultants/spark-kafka-avro/blob/master/build.sbt

Best practice for Spark to read kafka streams with schema registry (avro)?

1 Answers