I have installed kafka locally (no cluster/schema registry for now) and trying to produce an Avro topic and below is the schema associated with that topic.
{
"type" : "record",
"name" : "Customer",
"namespace" : "com.example.Customer",
"doc" : "Class: Customer",
"fields" : [ {
"name" : "name",
"type" : "string",
"doc" : "Variable: Customer Name"
}, {
"name" : "salary",
"type" : "double",
"doc" : "Variable: Customer Salary"
} ]
}
I would like to create a simple SparkProducerApi to create some data based on the above schema and publish it to kafka.
Thinking of creating sample data converting to dataframe and then change it to avro and then publish it.
val df = spark.createDataFrame(<<data>>)
And then, something like below:
df.write
.format("kafka")
.option("kafka.bootstrap.servers","localhost:9092")
.option("topic","customer_avro_topic")
.save()
}
Attaching schema to this avro topic can be done manually for now.
Can this be done just by using Apache Spark APIs instead of using Java/Kafka Apis? This is for batch processing instead of streaming.