0
votes

I am using HDP 2.5. I try to add time for file which is locate in HDFS file. For that I use GetHDFS->UpdateAttribute->PutHDFS.

First I get file from HDFS through GetHDFS processor and then I change format of file in UpdateAttribute by adding property "

${filename}.${now():format("yyyy-MM-dd-HH:mm:ss.SSS'z'")}". Finally I put file in HDFS. In this stage I have one issue for example If destination folder(in HDFS) contain file which already have time line. Once I run flow in result two or more time line is present for same file

File which contain already timeline

enter image description here

After flow of Nifi File contain two timeline

enter image description here

Can anyone tell me how to resolve this issue

2

2 Answers

4
votes

If you don't want to change your current workflow, the best option is probably to use the "File filter" property in the GetHDFS processor to only get files not containing the date in the filename (assuming your files have some naming convention). Another option is to send the renamed files in another directory.

As a general comment, I'd recommend using the combination of ListHDFS and FetchHDFS processors as it is a more efficient pattern when working with a NiFi cluster. You could then use a RouteOnAttribute in the middle to do some more advanced filtering than the "File filter" option.

Another comment: your approach is not the most performant one as you are downloading the data from HDFS, and then uploading it back. A rename/move operation in HDFS would probably be cleaner (or having a correct naming in the first place). You could use WebHDFS interface to perform the renaming using InvokeHTTP processor in NiFi in combination with ListHDFS processor.

0
votes

You can use Expression Langage to delete the previous timestamp and then add the current timestamp. You have several string functions such as substringBefore or substringAfter that you can use depending on the logic of your file names.

enter link description here