I have .txt and .csv files in my storage account. I want to delete only the .txt files how to do that in databricks using dbutils.fs.rm() ? or any other means?
1
votes
1 Answers
1
votes
I tend to use the *unix equivalent if it is a one time thing -
%sh
rm -rf /dbfs/mnt/<your-path>/*delete_files*.txt
Add /dbfs/ to your existing /mnt paths to access the underlying host filesystem
Else if you want to do it on a regular basis or part of your execution -
You can use the below function -
def run_os_scandir(directory,pattern=None):
pattern = re.compile(pattern)
fu = []
for f in os.scandir(directory):
#### If the files you are looking for are standalone files , use (not is_dir) else remove not condition
if not f.is_dir() and pattern.match(os.path.basename(f.path)):
fu += [f.path]
return fu
#### Usage , Note , the function works only if you add /dbfs to your mount path(s)
delete_file_lst = run_os_scandir('/dbfs/mnt/<your-path>/','*delete_files*.txt')
Once you have the required files , you can remove them using standard os package or dbutils
dbutils - [ dbutils.fs.rm(f[5:]) for f in delete_file_lst ] ### f[5:] , removes the /dbfs , from the file path os - [os.remove(f) for f in delete_file_lst]