Want to convert a set of different xml-files to Json. Nifi does not seem to understand the hierarky of the XMLs. Therefore we ended up stripping away the outer layers of the XML so we are left with just the inner segment and its children, and then use ConvertRecord to concert this into Json.
So this is an example XML we read:
'''
<?xml version="1.0" encoding="UTF-8"?>
<!-- =============================================================== -->
<!-- -->
<!-- DESCRIPTION: -->
<!-- -->
<!-- This is XML statistic file generated by the Statistics Export -->
<!-- Interface feature. This file contains statistics from -->
<!-- statistical server for adequate time intervals. -->
<!-- -->
<!-- =============================================================== -->
<!DOCTYPE channel_statistics SYSTEM "../DTD/channel.dtd">
<channel_statistics>\s*
<header>
<version>1.6</version>
<creation_date_time>2020-02-16 01:25 UTC</creation_date_time>
<zone_id>1</zone_id>
</header>
<data status="complete">
<interval status="complete" start="2020-02-16 00:00 UTC" length="900">
<channel channel_id="5" site_id="17" site_alias ="OF992BD01" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>900</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
<channel channel_id="5" site_id="17" site_alias ="OF989BD01" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>900</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
<channel channel_id="2" site_id="34" site_alias ="GF969BD31" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>0</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
</interval>
</data>
</channel_statistics>
'''
So by stripping it down to the interval-segment shown below, the ConvertRecord is able to read it.
'''
<interval status="complete" start="2020-02-16 00:00 UTC" length="900">
<channel channel_id="5" site_id="17" site_alias ="OF082BS01" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>900</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
<channel channel_id="5" site_id="17" site_alias ="OF082BS01" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>900</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
<channel channel_id="2" site_id="34" site_alias ="OF041BS01" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>0</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
</interval>
'''
Our setup is ListFile->FetchFile->ReplaceText->ConvertRecord->...
The ReplaceText processor is configured as following, but eigther does just pass the file unchanged to success or to failed queue with no error-message depending on what regex is used.
Here are the different regex config tryed:
<interval(.*)</interval>
/\<interval(.*)interval\>/s
\<interval((.|\n|\r)*)interval\>
(?<=<data status="complete">)(.*?)(?=<\/data>)
(?s)(?<=<data status="complete">)(.*?)(?=<\/data>)
What are we doing wrong?