I have created a parquet file with a decimal column type pa.decimal128(12, 4) using pyarrow. After I read the file and access its metadata I get the following output:
<pyarrow._parquet.ColumnChunkMetaData object at 0x7f4752644310>
file_offset: 26077
file_path:
physical_type: FIXED_LEN_BYTE_ARRAY
num_values: 3061
path_in_schema: Price
is_stats_set: True
statistics:
<pyarrow._parquet.Statistics object at 0x7f4752644360>
has_min_max: True
min: b'\x00\x00\x00\x00\x9b\xdc'
max: b'\x00\x00w5\x93\x9c'
null_count: 0
distinct_count: 0
num_values: 3061
physical_type: FIXED_LEN_BYTE_ARRAY
logical_type: Decimal(precision=12, scale=4)
converted_type (legacy): DECIMAL
compression: SNAPPY
encodings: ('PLAIN_DICTIONARY', 'PLAIN', 'RLE')
has_dictionary_page: True
dictionary_page_offset: 22555
data_page_offset: 23225
total_compressed_size: 3522
total_uncompressed_size: 3980
As you can see the min/max values are actually byte objects. How would I decode these to actual decimal values?
I tried casting it with
pc.cast(statistics.max, pa.decimal128(12, 4))
but got the following error message instead
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from binary to decimal using function cast_decimal
int.from_bytes(b'\x00\x00\x00\x00\x9b\xdc', 'big') / 1_000)- 0x26res