1
votes

I have .txt and .csv files in my storage account. I want to delete only the .txt files how to do that in databricks using dbutils.fs.rm() ? or any other means?

1

1 Answers

1
votes

I tend to use the *unix equivalent if it is a one time thing -

%sh

rm -rf /dbfs/mnt/<your-path>/*delete_files*.txt

Add /dbfs/ to your existing /mnt paths to access the underlying host filesystem

Else if you want to do it on a regular basis or part of your execution -

You can use the below function -

def run_os_scandir(directory,pattern=None):
    
    pattern = re.compile(pattern)
    
    fu = []
    
    for f in os.scandir(directory):
      #### If the files you are looking for are standalone files , use (not is_dir) else remove not condition
      if not f.is_dir() and pattern.match(os.path.basename(f.path)):
        fu += [f.path]

    return fu

#### Usage , Note , the function works only if you add /dbfs to your mount path(s)

delete_file_lst = run_os_scandir('/dbfs/mnt/<your-path>/','*delete_files*.txt')

Once you have the required files , you can remove them using standard os package or dbutils

dbutils - [
           dbutils.fs.rm(f[5:]) for f in delete_file_lst
         ] ### f[5:] , removes the /dbfs , from the file path 

os - [os.remove(f) for f in delete_file_lst]