1
votes

I have a structured streaming job which reads from event hub and write to delta lake table as /mytablepath , which is stored on Azure blob storage. In last 2 months run in Production it has created ~1000 small files in storage with each file having only 2-3 rows.

I tried to run optimize command on my delta lake table(path), but even after that number of files on blob storage has not reduced and when i run any query on table in notebook, it continue to show warning " query is on a delta table with many small files, run optimize to improve performance".

Thanks

1

1 Answers

0
votes

You need to run vacuum after you run optimize to clean up the small files.