Hi I need a lambda function that will read and write parquet files and save them to S3. I tried to make a deployment package with libraries that I needed to use pyarrow but I am getting initialization error for cffi library:
module initialization error: [Errno 2] No such file or directory: '/var/task/__pycache__/_cffi__x762f05ffx6bf5342b.c'
Can I even make parquet files with AWS Lambda? Did anyone had similar problem?
I would like to do something like this:
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
df = pd.DataFrame([data]) #data is dictionary
table = pa.Table.from_pandas(df)
pq.write_table(table, 'tmp/test.parquet', compression='snappy')
table = pq.read_table('tmp/test.parquet')
table.to_pandas()
print(table)
Or by some other method, just need to be able to read and write parquet files compressed with snappy.