I have a requirement to interpret and parse a log file whose contents are delimited by comma. The contents of the log file look like below. There is no header column in the file. I am putting it for our convenience.
Event_TimeStamp, Target_IP, UserName, Source_IP, DatabaseName, Activity, QueryText, Flag
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, Connect, , 0
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, Query, 'select * from table1', 0
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, Query, 'select * from table2', 0
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, Query, 'select colname1,colname2,colname3,colname4 from table1', 0
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, Query, 'select colname1,\'some Stringliteral\' as colname2,colname3,colname4 from table2', 0
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, DisConnect, , 0
I have used the convertRecord and configured CSVRecordReader and CSVSetRecordWriter to read/interpret the data and write the data to flow file respectively. The schema for CSV Read is as below.
{
"type": "record",
"namespace": "nifi",
"name": "db",
"fields": [
{ "name": "Event_Timestamp", "type": "string" },
{ "name": "SourceIP", "type": "string" },
{ "name": "UserName", "type": "string" },
{ "name": "TargetIP", "type": "string" },
{ "name": "DatabaseName", "type": "string" },
{ "name": "Activity", "type": "string" },
{ "name": "QueryText", "type": "string" },
{ "name": "Flag", "type": "int" },
]
}
The nifi flow is failing in reading the data because the delimiter configured while setting up CSVRecordReader is ","(comma) and the QueryText also contains comma within the text. If I only put the first 3 lines in the new log file and run the flow, then the job is successful since the Querytext doesn't contain any commas within.
I do not know if the CSVReader is the right processor to use here. Can anyone help me how to read a log file that is comma-separated but, one or a few fields has comma's within the text. Any help is highly appreciated.
') so configure your CSVRecordReader propertyQuote Character = 'default is"- Vikramsinh ShindeCSV Parser = Jackson CSV- Vikramsinh ShindeQueryTextfield value is not enclosed in any quote character, then it would be confusing for CSV parser whether to treat comma as delimiter or part of data. In such case you have to write a custom cleaning script before theConvertRecordprocessor to escape comma belongs to data part, but again how does your custom script will know that whether comma is delimiter or part of data. So the source which generates CSV should adhere to at least minimum set of rules of CSV specification. - Vikramsinh Shinde