1
votes

I am trying to copy files to a folder based on current_date and extension .csv using Databricks utilities - dbutils

I have created the following:

import datetime
now1 = datetime.datetime.now()
now = now1.strftime("%Y-%m-%d")
from datetime import datetime
today = datetime.today().date()

I have then tried the following

dbutils.fs.cp('adl://mylake.azuredatalakestore.net/testfolder/*{today}.csv','adl://mylake.azuredatalakestore.net/testfolder/RAW/')

dbutils.fs.cp('adl://mylake.azuredatalakestore.net/testfolder/*{now}.csv','adl://mylake.azuredatalakestore.net/testfolder/RAW/'

However, I keep on getting invalid syntax error

Any thoughts?

1
So if file was in the folder named LCMS_MRD_Delta_LoyaltyAccount_1166_2018-12-29 06-05-52.csv that would get copied over, but if file in the folder was named LCMS_MRD_Delta_LoyaltyAccount_1166_2018-12-28 06-05-52.csv that wouldn't get copied over - Carltonp
I'm trying to figure this out myself, and I think I'm nearly there. I have fulling files in my folder 2018-12-29.csv, LCMS_MRD_1166_2018-12-29 06-05-52.csv, LCMS_MRD_1167_2018-12-29 06-06-49.csv. If I enter the command dbutils.fs.cp('adl://carlslake.azuredatalakestore.net/testfolder/RAW/%s.csv'% now,'adl://carlslake.azuredatalakestore.net/testfolder/') the only file that will be copied over is 2018-12-29.csv. I just need to know where to place an * (or something) that will copy all files with date 2018-12-29. - Carltonp
Ok, the following would work dbutils.fs.cp('adl://carlslake.azuredatalakestore.net/testfolder/RAW/LCMS_MRD_1166_%s '% now,'adl://carlslake.azuredatalakestore.net/testfolder/') but because of the timestamp i.e 06-05-52 its failing with the error File/Folder does not exist: /testfolder/RAW/LCMS_MRD_1166_2018-12-29. When you think about it, the error is correct, because it can't see the timestamp. So I either need to be able to add a switch that will copy everything or a way to remove the timestamp - Carltonp

1 Answers

2
votes

dbutils.fs.cp copies individual files and directories and does not perform wildcard expansion, see dbutils.fs.help("cp") for reference. You can try to list contents of the source directory with dbutils.fs.ls, filter results in python, and then copy matching files one by one.