I am trying to read an an Avro file using the python avro library (python 2). When I use the following code:
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter, BinaryDecoder
reader = DataFileReader(open("filename.avro", "rb"), DatumReader())
schema = reader.meta
Then it reads every column correctly, except for one which remains as bytes, rather than the expected decimal values.
How can I convert this column to the expected decimal values? I notice that the file's metadata identifies the column as 'type' : 'bytes', but 'logicalType' :'decimal'
I post below the metadata for this column, as well as the byte values (expected actual values are all multiples of 1,000 less than 25,000. The file was created using Kafka.
Metadata:
{
"name": "amount",
"type": {
"type": "bytes",
"scale": 8,
"precision": 20,
"connect.version": 1,
"connect.parameters": {
"scale": "8",
"connect.decimal.precision": "20"
},
"connect.name": "org.apache.kafka.connect.data.Decimal",
"logicalType": "decimal"
}
}
Byte values:
'E\xd9d\xb8\x00'
'\x00\xe8\xd4\xa5\x10\x00'
'\x01\x17e\x92\xe0\x00'
'\x01\x17e\x92\xe0\x00'
Expected values:
3,000.00
10,000.00
12,000.00
5,000.00
I need to use this within a Lambda function deployed on AWS, so cannot use fast_avro, or other libraries using C rather than pure Python.
See links below: https://pypi.org/project/fastavro/ https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html