1
votes

Question 1:

Is there a way to list and delete empty folders on Azure Data Lake Store Gen 1?

Scenario:

We require to periodically run a job to delete all empty folders recursively under a root folder in our data lake storage.

Folder paths cannot be hard coded as there can be 100 s of empty folders.

Question 2:

Can we use Data Factory or Data bricks to perform this operation?

Thanks.

2

2 Answers

1
votes

Rinks.I implemented your requirement with GetMetadata Activity,If-Condition Activity,For each Activity and Delete Activity. Please see my detailed steps:

Step1, i created 2 empty folders and 1 folder contains one csv file in the root path.

enter image description here

enter image description here

Step2, create GetMetadata Activity in the ADF pipeline and output the childItems.

enter image description here

Step3, loop the output by ForEach Activity:@activity('Get Metadata1').output.childItems

enter image description here

Total structure like:

enter image description here

Step4,Inside ForEach Activity,use another GetMetaData Activity and If-condition Activity:

Set the Directory as @item().name

enter image description here

Set the condition expression as @empty(activity('Get Metadata2').output.childItems)

Total structure like:

enter image description here

Step5,set Delete Activity as the Failed Activity of For each Activity. Set the @item.name() as directory of Delete Activity Dataset.

Test result,test2 and test3 folder are deleted:

enter image description here

enter image description here

Any concern,pls let me know.

0
votes

Answered at https://social.msdn.microsoft.com/Forums/en-US/526006aa-f378-4766-9aba-532223a44814/how-to-list-and-delete-empty-folders-on-azure-data-lake-store-gen1?forum=AzureDataLake

After mounting in databricks and getting through any permissions issues, one potential (python3) solution:

def recur(item): good_to_delete_me = True contents = dbutils.fs.ls(item) for i in contents: if not i.isDir(): good_to_delete_me = False else: can_delete_child = recur(i.path) good_to_delete_me = good_to_delete_me and can_delete_child if can_delete_child: test= i.path dbutils.fs.rm(test) return good_to_delete_me