1
votes

I would like to convert the following CSV content, which contains a timestamp in microseconds, via the CSVRecordReader to an AvroRecord via the AvroRecordSetWriter:

timestamp,value    
1551784149996000,1

I'm using the following Avro schema:

{
  "name": "TestRecord",
  "type": "record",
  "fields": [
    {
      "name": "timestamp",
      "type" :
      {
        "type" : "long",
        "logicalType" : "timestamp-micros"
      }
    },
    {
      "name": "value",
      "type": "long"
    }
  ]
}

But the CSVRecordReader seems to interpret the microseconds as milliseconds and therefore the output of the AvorRecordSetWriter contains three more zeros:

1551784750036000000

Timestamp fields will be assumed to be number of milliseconds since epoch (Midnight, Jan 1, 1970 GMT) https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.9.0/org.apache.nifi.csv.CSVReader/index.html

How can I parse microseconds in NiFi and convert it to Avro or Parquet using this schema?

Maybe the following Instant ISO format is somehow supported by any NiFi RecordReader?

2019-03-01T13:12:34.567123Z
1
Is there any way to prevent * 1000 in getLongFromTimestamp? - Martin

1 Answers

0
votes

The easiest way is probably to use an UpdateRecord processor following the ConvertRecord (if you need to retain microsecond precision) or preceding the ConvertRecord (if you only need millisecond precision) to trim the last three digits from that field.

You can use the Record Path function substringBeforeLast to trim via substringBeforeLast(timestamp, '000').