I have a csv file with more than 70 columns and millions of records. But for processing I need around 20 columns. To extract them I have used modified groovy script in Split fields and NiFi API with Groovy.
My script :
import java.nio.charset.StandardCharsets
def flowFile = session.get()
if(!flowFile) return
flowFile = session.write(flowFile, {inputStream, outputStream ->
inputStream.eachLine { line ->
a = line.tokenize(',')
outputStream.write("${a[0]},${a[1]},${a[2]},${a[3]},${a[4]}\n".toString().getBytes(StandardCharsets.UTF_8))
}
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
Original csv :
date,id,name,age,addr,lang
19/12/2019,1,sachith,29,sac@email,go
,2,nalaka,29,nalaka@email,
,,muhandiram,,c
When this is being processed by ExecuteScript processor, output looks like :
date,id,name,age
19/12/2019,1,sachith,29
2,2,nalaka,29
null,null,muhandiram,null
If data field is empty, its replaced with id and other empty values are replaced with null.
Why script misbehaves?
My knowledge about groovy is almost nothing.
expected output :
date,id,name,age
19/12/2019,1,sachith,29
,2,nalaka,29
,,muhandiram,