3
votes

I’d be grateful for some help with automating Tableau server data extract refreshes. This seems straightforward if your data source is a database server but I want to refresh from a csv file on the Tableau server.

Our setup is that we have a Unix data server which produces two csv files each week (1GB total size) which we zip, transfer to Tableau desktop and use to create tableau data extracts (tde files). In Tableau desktop, we then use “Edit data source” to point our workbook at the new tde files, and we publish the workbook to the Tableau server (running Windows) as a twbx. (And as a further complication, we also use one of the tde files from the previous week as a third data source).

We want to automate this process, so ideally I would run a script on the Tableau server to copy the zip file from the data server to the Tableau server, unzip, extract and point the workbook on the server to the new tdes. This doesn’t seem to be possible, so what is my best option?

Here are some approaches I’ve considered:

  1. Run scripts on Tableau desktop to recreate what we do manually (using the tableau data extract command line utility to do the extracts and tabcmd to republish the workbook). This requires someone to turn on a laptop at the right time each week, so is not really automation.

  2. Run scripts on the Tableau server to copy the files over then run Tableau data extract command line utility on Tableau server to produce the tde – doesn’t work because that utility is part of Tableau desktop, not Tableau server.

  3. As 2 but use tabcmd to produce the new tdes on Tableau server – doesn’t work because tabcmd doesn’t have the “—original-file” option that Tableau desktop does.

  4. Write a program that uses the Tableau data extract API to produce the tde, and run this on the Tableau server. I’m not sure how much effort that would be but more importantly, the tde doesn’t appear to be available on the server as a tde (it is stored in another form).

  5. Restructure how the data is treated on our data server: we have MySQL running on that server, and could put the csv data into that db, then run a conventional extract refresh with tabcmd on the Tableau server. I think this would work, but would presumably mean we are transferring 1GB data (or more?) through the SQL queries, rather than the 50 MB zip. Also we’d have to create a "last week" snapshot of the data for our 3rd data source). The data doesn’t lend itself to incremental updates - old records can change as well as new records being added. This is all more of a workaround than I would like to do.

  6. Run a database server on the Tableau server, and use scripts to put the csvs in there each week, then use that database as the data source. Even more of a crazy work around, just because Tableau doesn’t want to play with csvs.

Am I missing something? Is there an easier way? I’ve ruled out 2,3 and 4 above, but would 1, 5 or 6 actually work?

We're using Tableau server 8.2.

Many thanks to anyone who can help.

2

2 Answers

4
votes

In the end the solution for us (suggested by Tableau support) was to install Tableau Desktop on the same machine that we run Tableau Server on. That means that we have the tableau data extract utility available but also, crucially, that the file paths seen by Tableau Desktop and Tableau Server are identical.

Previously we created the workbook and data extracts in Desktop on a different machine. Even with the same file paths to the csv files (on different machines) the data extract utility was not able to refresh the extracts - it couldn't find the csv data source files. But creating the workbook and its data sources (extracts of csv files on the server machine) with Desktop on the server machine means the data extract utility can refresh the data sources and Tableau Server sees those refreshes.

The other way to have done this would have been to put the original data source files on a shared location and used UNC paths to address them. We didn't want to do this as it requires Active Directory to be running on the server machine and to install that apparently meant we had to unintall and reinstall tableau server.

1
votes

in version 9, you can use the tableau data extract command line utility to create, refresh or append to an extract. See the documentation. In version 8, you could write your own program to do that using the Tableau Data Extract API.

In version 9, you can publish the extract using the REST API. In version 8, you could publish by using the tabcmd utility. No need to edit or republish the workbooks if they use the published data source - that is publish the data source separately and then connect to the published data source from the workbooks. That way you can update each independently as needed.

You don't have to execute this on the Tableau server if you prefer another machine. Currently, some of these options require windows OS though