0
votes

Is there a way to generate schema less avro from Apache spark? I can see a way to generate it through Java/Scala using apache avro library and through confluent avro. When I write Avro from Spark in below way, it creates Avro's with schema. I want to create without schema to reduce the size of final dataset.

df.write.format("avro").save("person.avro")
1
So is the answer incorrect ?thebluephantom
Thanks for your answer and I agree with you point however I need to publish these Avro's to Kafka and I wanted to publish binary avro's like confluent Kafka does, so I ended up writing a serializer that converts the Avro's to binary Avros.Explorer
Not sure I get it as there is always an association of avro schema per file with avro. avro is binary format. You can read with a custom schema. You cannot avoid but you can do the confluent thing as I have seen in the past.thebluephantom
I think you are confusing the Q&A with your extra insights. But no issue, success.thebluephantom

1 Answers

2
votes

You need not worry. And you cannot obviate the approach.

AVRO has the data and the schema, always.

AVRO is different to JSON which stores the schema per record that resides in the data itself.

With AVRO the schema is stored once per file. So there is little overhead to consider.