I have a set of avro files saved in aws S3 with known schema defined in a .avsc file. Is there a way to create a dataset of objects in spark with the schema defined?
The schema look like:
{
"type" : "record",
"name" : "NameRecord",
"namespace" : "com.XXX.avro",
"doc" : "XXXXX",
"fields" : [ {
"name" : "Metadata",
"type" : [ "null", {
"type" : "record",
"name" : "MetaNameRecord",
"doc" : "XXXX",
"fields" : [ {
"name" : "id",
"type" : "int"
}, {
"name" : "name",
"type" : [ "null", "string" ],
"default" : null
}]
}
I would like to create a dataset of NameRecord: Dataset[NameRecord]