Nifi ValideCSV Schema example

Question

I am trying to use ValidateCSV processor in Nifi, but I don't know how to define schema. My output(flowfile) is as below:

> PassCountId,CameraId,EventDate,Counter
> 
> 32340,4,2020-10-14 15:26:20.170,4
> 
> 32341,3,2020-10-14 15:26:51.747,4
> 
> 32342,3,2020-10-14 15:26:57.907,6

I tried below schema but it didn't work.

{
  "type": "record",
  "name": "NifiRecord",
  "fields" : [
    {"name": "PassCountId", "type": "bigint"},
    {"name": "CameraId", "type": "int"},
    {"name": "EventDate", "type": "datetime"},
    {"name": "Counter", "type": "int"}
  ]
}

What is the proper way to define a schema?

I already checked documentation. It didn't help unfortunately. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.ValidateCsv/

Thanks.

DS Steven Matison DS Steven Matison · Accepted Answer · 2020-10-16T12:36:36

@Tyr here is an example of schema

   "type" : "record",

   "namespace" : "nifi",

   "name" : "nifi",

   "fields" : [

      { "name" : "c1" , "type" :  ["null", "string"] },

      { "name" : "c2" , "type" : ["null", "string"] },

      { "name" : "c3" , "type" : ["null", "string"] }

   ]

}

As per the documentation you can feed the schema validation functions:

 : [ParseBigDecimal, ParseBool, ParseChar, ParseDate, ParseDouble, ParseInt, ParseLong, Optional, DMinMax, Equals, ForbidSubStr, LMinMax, NotNull, Null, RequireHashCode, RequireSubStr, Strlen, StrMinMax, StrNotNullOrEmpty, StrRegEx, Unique, UniqueHashCode, IsIncludedIn]

My recommendation would be to start with a schema of strings, then experiment with ParseBigDecimal for example for your first column. Work in small testable iterations until you have a fully built schema.

Nifi ValideCSV Schema example

1 Answers