I am trying to copy data from S3 to HDFS, observed couple of issues and have few questions.
Issues
Processor ConvertJSONToAvro - If the flowfile is not a valid JSON, then the processor gets stuck in infinite loop with following error.
ConvertJSONToAvro[id=c09f4c27-0160-1000-6c29-1a31afc5a8d4] ConvertJSONToAvro[id=c09f4c27-0160-1000-6c29-1a31afc5a8d4] failed to process session due to java.lang.RuntimeException: Unexpected character ('"' (code 34)): was expecting comma to separate OBJECT entries at [Source: org.apache.nifi.controller.repository.io.FlowFileAccessInputStream@2ad7d50d; line: 8, column: 14]: Unexpected character ('"' (code 34)): was expecting comma to separate OBJECT entries at [Source: org.apache.nifi.controller.repository.io.FlowFileAccessInputStream@2ad7d50d; line: 8, column: 14] 16:45:35 UTC WARNING c09f4c27-0160-1000-6c29-1a31afc5a8d4 ConvertJSONToAvro[id=c09f4c27-0160-1000-6c29-1a31afc5a8d4] Processor Administratively Yielded for 1 sec due to processing failureProcessor FetchS3Object - Irrespective of value set to "Object key", it always pick value of ${filename}. For example, if "Object key" is set to "${Newfilename}", it ignores the value set and picks ${filename} only.
Questions
Is it possible to refer flowfile from previous processors ? My usecase is FetchS3Object(file1) -> EvaluateJsonPath -> FetchS3Object(file2) -> PutHDFS -> FetchS3Object(file1) -> PutHDFS. In this case instead of loading file1 multiple times, is it possible to store and refer it through out the flow.
In above point, files file1 and file2 are one unit. Is there any options to copy both files or fail for both
ListS3 processor loads files based on timestamp. If a file is loaded and failed in any other step, then it needs to be loaded again for reprocessing. One option is update the timestamp of the file, so it will be avilable for ListS3 during next poll. How do we update timestamp of a file in S3 ? or there any other options to handle usecases like this.