I am trying to export a large dataset from SQL Server to my local hard disc for some data analysis. The file size goes up to 30gb, with 6 million over rows and about 10 columns.
This data will then be fed through python Pandas or Tableau for consumption. I am thinking the size of the file itself will give me poor performances during my analysis.
Any best practices to be shared for analyzing big-ish data on a local machine?
I am running an i7 4570 with 8gb ram. I am hoping to be less reliant on SQL queries and be able to run huge analysis offline.
Due to the nature of the database, a fresh extract needs to happen and this process will have to repeat itself, meaning there will not be much of appending happening.
I have explored HDFStores and also Tableau Data Extracts, but still curious whether I can get better performances by reading whole CSV files.
Is there a compression method of sorts that I might be missing out? Again the objective here is to run the analysis without constant querying to the server, the source itself (which I am optimizing) will refresh itself every morning so when I get in office I can just focus on getting coffee and some blazing fast analytics done.