I am new to Snowflake and Python. I am trying to figure out which would faster and more efficient:
- Read data from snowflake using fetch_pandas_all() or fetch_pandas_batches() OR
- Unload data from Snowflake into Parquet files and then read them into a dataframe.
CONTEXT I am working on a data layer regression testing tool, that has to verify and validate datasets produced by different versions of the system.
Typically a simulation run produces around 40-50 million rows, each having 18 columns.
I have very less idea about pandas or python, but I am learning (I used to be a front-end developer).
Any help appreciated.
LATEST UPDATE (09/11/2020) I used fetch_pandas_batches() to pull down data into manageable dataframes and then wrote them to the SQLite database. Thanks.