0
votes

I have the following problem. I need to copy from a ADLS(Azure data lake store) source to a sink ADLS, but only the most recent file. Each hour, arrives to the source a .csv file, this file has to be copied to the sink data lake. For instance:

event: Hour1 - file_01.csv arrives to source. task: copy file_01.csv to sink data lake. event: Hour2 - file_02.csv arrives to source. task: copy file_02.csv to sink data lake. And so on.

Is there anyway to create an event based trigger(the arrival of new file in the source)? That was my first thought.

Another way, would be to create a job, run by Azure Data lake analytics. In there I would extract the system date & time (I dont know how to do this). Choose the most recent file, and copy that file into the sink data lake. How can I declare a variable containing the date&time using u-sql? How can I copy data into a data lake using u-sql?

Summary: How can i make an incremental/updated copy among data lakes?

Thanks

2

2 Answers

1
votes

Unfortunately, ADLS does not currently have a way by which an event can be triggered when a file arrives. That being said we are working on providing that support and it should be available shortly.

To do incremental copy, you could do things like organizing files into folders which have time information in it. And then use tools like Azure Data Factory to copy over only the files which are in the specific current time range.

Thanks, Sachin Sheth Program Manager, Azure Data Lake.

0
votes

You can use DateTime.Now to get the compile time of the job. You can also extract the modified or created time of a file as well. For example:

@data = 
  EXTRACT 
    vehicle_id int
  , entry_id long
  , event_date DateTime
  , latitude float
  , longitude float
  , speed int
  , direction string
  , trip_id int?
  , modified_date = FILE.MODIFIED()
  , created_date = FILE.CREATED()
  FROM "/Samples/Data/AmbulanceData/vehicle{*}"
  USING Extractors.Csv();

@res =
  SELECT *
  FROM @data
  WHERE created_date <= DateTime.Now.AddDays(-1);

I asked the store team members to answer your question regarding file triggers.