0
votes

We are using HDP hadoop distribution v2.3.2, we are dealing with Hive external tables and these are queried on daily basis.

After few days since process started, the data directories contain lot of staging directories with format: .hive-staging_hive_date-time_ There are lot of staging directories generated, each staging directory corresponds to a query run on the Hive table.

What can I do to avoid these staging directories to be piled up into my data directories ?

1

1 Answers

2
votes

The answer I posted at https://stackoverflow.com/a/35583367/14186 may help you here. You can configure Hive to make those staging dirs some place else (normally they are made as a subdir of the final destination dir)

In the example from that answer, I have hive put them in dirs under /tmp, and we have a cron-job that we run each day to delete any leftover staging dirs older than 1 week to keep things tidy in case hive doesn't remove them.