0
votes

I am trying to create a flow within Apache-Nifi to collect files from a 3rd party RESTful APi and I have set my flow with the following:

InvokeHTTP - ExtractText - PutFile

I can collect the file that I am after, as I have specified this within my Remote URL however when I get all of the data from said file it is outputting multiple (100's) of the same files to my output directory.

3 things I need help with:

1: How do I get the flow to output the file in a readable .csv rather than just a file with no ext

2: How can I stop the processor once I have all of the data that I need

3: The Json file that I have been supplied with gives me the option to get files from a certain date range:

https://api.3rdParty.com/reports/v1/scheduledReports/877800/1553731200000

Or I can choose a specific file:

https://api.3rdParty.com/reports/v1/scheduledReports/download/877800/201904/CTDDaily/2019-04-02T01:50:00Z.csv

But how can I create a command in Nifi to automatically check for newer files, as this process will be running daily and we will be looking at downloading a new file each day.

If this is too broad, please help me by letting me know so I can edit this post.

Thanks.

Note: 3rdParty host name has been renamed to comply with security - therefore links will not directly work. Thanks.

1
Your links are not working. - mle
Links wont work as I replaced the 3rd Party details with "3rdParty" - Donna

1 Answers

3
votes

1) You change the filename of the flow file to anything you want using the UpdateAttribute processor. If you want to make it have a ".csv" extension then you can add a property named "filename" with a value of "${filename}.csv" (without the quotes when you enter it).

2) By default most processors have a scheduling strategy of timer-driver 0 seconds, which means keep running as fast as possible. Go to the configuration of the processor on the scheduling tab and configure the appropriate schedule, it sounds like you probably want CRON scheduling to schedule it daily.

3) You can use NiFi expression language statements to create dynamic time ranges. I don't fully understand the syntax for the API that you have to communicate with, but you could do something like this for the URL:

https://api.3rdParty.com/reports/v1/scheduledReports/877800/${now()}

Where now() would return the current timestamp as an epoch.

You can also format it to a date string if necessary:

${now():format('yyyy-MM-dd')}

https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html