2
votes

I am using Python 3.6 interpreter in my PyCharm venv, and trying to convert a CSV to Parquet.

import pandas as pd    
df = pd.read_csv('/parquet/drivers.csv')
df.to_parquet('output.parquet')

Error-1 ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. pyarrow or fastparquet is required for parquet support

Solution-1 Installed fastparquet 0.2.1

Error-2 File "/Users/python parquet/venv/lib/python3.6/site-packages/fastparquet/compression.py", line 131, in compress_data (algorithm, sorted(compressions))) RuntimeError: Compression 'snappy' not available. Options: ['GZIP', 'UNCOMPRESSED']

I Installed python-snappy 0.5.3 but still getting the same error? Do I need to install any other library?

If I use PyArrow 0.12.0 engine, I don't experience the issue.

1

1 Answers

1
votes

In fastparquet snappy compression is an optional feature.

To quickly check a conversion from csv to parquet, you can execute the following script (only requires pandas and fastparquet):

import pandas as pd
from fastparquet import write, ParquetFile
df = pd.DataFrame({"col1": [1,2,3,4], "col2": ["a","b","c","d"]})
# df.head() # Test your initial value
df.to_csv("/tmp/test_csv", index=False)
df_csv = pd.read_csv("/tmp/test_csv")
df_csv.head() # Test your intermediate value
df_csv.to_parquet("/tmp/test_parquet", compression="GZIP")
df_parquet = ParquetFile("/tmp/test_parquet").to_pandas()
df_parquet.head() # Test your final value

However, if you need to write or read using snappy compression you might follow this answer about installing snappy library on ubuntu.