1
votes

I am looking a best way to analyse 4B records (1TB data) stored in Vertica using Tableau. I tried using extract of 1M records which works perfectly. but dont know how to manage 4B records, because its taking too long to query on 4B records.

I have following dataset :

timestamp id url domain keyword nor_word cat_1 cat_2 cat_3

So here I need to create descending list of Top 10 ID's, Top 10 url, Top 10 domain, Top 10 keyword, Top 10 nor_word, Top 10 cat_1, Top 10 cat_2, Top 10 cat_3 depending count of each field value in separate worksheet and combine all worksheet in one dashboard.

There is no primary key. This dataset of 1 month so I want to make global filter start date and end date to reduce the query size. But don't know how to create global date filter and display on dashboard ?

1

1 Answers

2
votes

You have two questions, one about Vertica and one about Tableau. You should split these up.

Regarding Vertica, you need to know that Vertica stores data in ascending sort order in physical storage. This means that an additional step will always be required anytime you want to get a descending sort order.

I would suggest creating a partition on the date, and subsequently running Database Designer (DBD) in incremental mode and using your queries as samples. By partitioning the data, Vertica can eliminate the partitions during optimization.

Running the DBD will generate some better optimized projections. You should consider the trade-off between how often you will need this data and whether it's worth creating these additional projections as it will impact your load performance.