0
votes

I am trying to understand the uses and limitation of outer joins in tableau (tableau online in this case). I have found the beavaiour of tableau to be not what I have expected. I have provided as detailed a description to my problems below, to avoid any ambiguotity and since I don't know where to start anymore. I hope I have not gone overboard(edits welcome).

Specifics of my use case

I am creating a join between two .csv files that have logged natural data at specific time intervals. One set has hourly time intervals, the other at intervals of minutes (which is variable due to various factors).

'Rain' data set(1):
Date and Time | Rain
01/01/2018 00:00 | 0
01/01/2018 01:00 | 0.4
01/01/2018 02:00 | 1.4
01/01/2018 03:00 | 0.4

'Fill' data set (2):
Date and Time | Fill
24/04/2018 06:04 | 78
24/04/2018 12:44 | 104
24/04/2018 18:51 | 96
25/04/2018 00:20 | 84

Unsurprisingly, I have many nulls in the data (which is not a problem to me) as:

  • 'Rain' has a longer time series
  • In either data set, the majority of date times do not have an exact equivalent in the other

screenshot of data join here

What I am trying to achieve

I am trying to graph the two data sets in such a way that that I can compare the full data sets against each other, in all of the following ways:

  • Monthly or Yearly aggregation (average)
  • Hourly aggregation (average)
  • Exact times

Problems (and my limited assumptions)

  1. Once graphed in tableau some values had 'null' DateTime values*.
  2. Once graphed in tableau it appears as if many points are simply missing** Graphing using 'Fill' time series
    Graphing using 'Rain' time series
  3. I had assumed (giving the full outer join of 'Date and Time(s)') tableau would join the data sets in chronological order with a common date time series

* I had assumed it impossible for the join conditions to have 'null' values without throwing an error. Also, the data is clean and uniform
** And this is when aggregating monthly, which I assumed would not be affected by any (if any) hourly/minute mismatches

So, finally the question #

In my reading of the online help documentation I am struggling to find a functionality that is native to tableau that can help me achieve these specific goals. I am reaching the worrying conclusion that tableau was not built for this type of 'visual analytics'.
Is there a functionality native to tableau that will allow me to combine the data in the way I described above?

Approaches I have considered

Since I have two .csv files I could combine both set so that I have the full, granular 'Date and Time' fields in one tall list.
However, I would like to find a method that is natural to tableau (online) because in future, at least some of the data wil come from a database (postgres) connection but others will likely have to remain as upload as a .csv or excel files.

Again I ask

What am I overlooking in regards how (and why) to use tableau? I am not looking for a complete solution, but what tools could I use to achieve this?

Many thanks for any help

2
Join is a database concept to "merge" two datasets using some fields that have common values in them. If you wanted to know the rain and fill values at 2018-03-01 3.30pm and both tables had values at that time then join would work. What you have is date time fields that don't have common values. What you should do it to "union" the tables (generating a table containing one date time field and two value fields with all your values in it (but many blanks). Tables will then handle the aggregates and give aggregate values correctly.matt_black
@ matt_black I had understood a union to be a concatenation of two or more files. Would it 'weave' the data together? Or is this step handeled when it comes to putting the data into a viz?goldfishalpha
The "weaving" can be done when putting the data into a viz. Tableau will aggregate the time field (say into days) and will aggregate the two data fields for that day (even if none of the specific time fields match).matt_black

2 Answers

0
votes

Your databases more specifically datasources are in a different level of granularity one is in hours(Higher Level of granularity) and other is in minutes (Lower level of granularity) but your requirmenet is different

Year/Month -- High aggregation

Hourly -- Medium agregation

Exact -- Lower aggregation

When you join two data sources on dates and times (Which would never match) you will get these kind of weird results.

Possible Solution:

  1. Their is a tableau prep tool, use the tool and make both data sources at same level of aggregation, in you case dataset 2 will be aggregated to hour level and the join both the tables, In this case you need to check last requirement (Exact times) as I assume you are looking for the charts at minutes level

  2. Other solution is use blending where primary datasource will be dataset 1 and secondary datasource will be dataset 2, in this case you will get the required data where tableau manages the aggregation and granularity.

Let me know how it goes

0
votes

So it appears as if various solutions are available.
I want to post this now but will re-edit when I get a bit more time

Option 1

One work-around/solution I found was to create a calculated field as mentioned here and then graph everything against this time series.
This worked well for me even after having created 20+ sheets and numberous dashboards.
As mentioned below, other uses may not provide this flexibility.
Calculation:

IFNULL([Date and Time (Fill.csv)],[Date and Time (Rain.csv)]))

Option 2

This is as mentioned by matt_black a join of the data performs the job quite well. It seems less hacky and is perfect when starting from a clean slate.
I had difficulty creating a join on data sources already in use (will do more poking around on this)

Option 3 ?

As in the answer provided by Siva, blending maybe an option.
I have not confirmed this as of yet.