0
votes

I've a 100GB CSV file (200 million rows X 60 columns) which I'm using to create dashboards in Tableau Desktop via extract. I've been facing a performance issue and it takes about 2 minutes to refresh the dashboard when I select/deselect any of the dimension filters. Can using Tableau Server solve this? If yes, how many nodes should I configure and what configuration for each of them?

Can it also be an issue with the cache settings?

I was also thinking of putting this data into a columnar database like Redshift and then use live connection so that Redshift querying engine is used instead of Tableau. Is that an overkill for this kind of small dataset?

1
This isn't a small dataset especially if you are working on local storage. Have you tried google's BigQuery? This has very low startup costs and can be very responsive for this sort of dataset.matt_black

1 Answers

4
votes

Redshift or Athena would work well for this.

Redshift would be easy to set up and costs about $250 per month for a single node. you would need to tune your redshift table to get reasonable performance.

AWS Athena may be good way to get good performance at a reasonable price

Simple Athena solution:

  1. gzip your CSV file, split it into chunks of about 10MB
  2. upload that to a s3 bucket
  3. run aws glue crawler against that bucket
  4. point your tableau desktop to the athena table that is now there.

Lets say your gzipped file is 25GB, each query would cost you $0.13 with very little other costs.

If that is too much (because you want to run a lot of queries), then you can reduce costs and improve performance by

  • partitioning your data into s3 folders
  • converting your data to parquet format