There are several ways to do this. You will probably end up combining the parameterization and the scheduling features to run scheduled jobs that would pick new files every time.
Depending on your use case, you can for e.g. do:
Importing a directory
If you setup a directory that only contains one excel file (see picture below), you can use the + button to use the directory as input dataset.
Every time you will run a job the files present in that directory will be processed.
You can now schedule the job, create an output destination and you should be all set.
Using date time parameters
Let's assume you are in the situation where you add a new file every day with the date in the file name. For e.g. in Cloud storage, it would look like this:
You can use the Parameterize button in the Dataprep file browser and setup the following parameter:
This should select the file from the previous day:
You can them import the dataset and schedule the flow. If your schedule run every day, it will pick up the new file each time.
Using variables
Alternatively, you can define a variable in the file path of your dataset.
You can then use the JobGroup API to override that variable.
POST /v4/jobGroups
{
"wrangledDataset": {
"id": datasetId
},
"runParameters": {
"overrides": {
"data": [
{
"key": "folder-name",
"value": "new folder name"
}
]
}
}
}
Note that for this to work, your file need to have the same structure. See https://cloud.google.com/dataprep/docs/html/Create-Dataset-with-Parameters_118228628#structuring-your-data for more details.
Using a wildcard parameter should also be possible as an alternative to the first method should also be possible.