I'm getting data files from Kafka that are in XML or AVRO formats. Each message is being wrapped with double quotes (ex. "..."). I want to use NiFi to remove the double quotes surrounding the contents.
I cannot use ReplaceText processor to remove all the double quotes because some of the tags use double quotes in their attributes.
I'm trying to use the ExtractText processor, but from my understanding the output of the regex will be put into an attribute and not replace the FlowFile contents. Also, I'm not sure what to write for the regex because I would need to use the content length to remove the first and last characters. Also, I cannot use the tag names in the regex because I need to do the same with other contents as well.
Here's an example of how the XML file is setup with the surrounding double quotes and some of the tags using attributes with double quotes.
"<?xml version="1.0" encoding="UTF-8" standalone="yes"?><t1:Foo1><t2:Foo2 reportIndicator="...">...</t2:Foo2></t1:Foo1>"
I expect to use a ConsumeKafka_0_10 processor (working fine) outputting the FlowFile with the xml that has surrounding double quotes contents to another processor (ExtractText?) outputting a FlowFile with the xml without surrounding double quotes contents to a PutFile processor (working fine).
Open to other suggestions as well! I was also thinking about adding a processor to execute some code if that could edit the file. Seems messy though.