I have a csv with 70 columns. The 60th column contains a value which decides wether the record is valid or invalid. If the 60th column has 0, 1, 6 or 7 it's valid. If it contains any other value then its invalid.
I realised that this functionality wasn't possible relying completely on changing property's of processors in Apache NiFi. Therfore I decided to use the executeScript processor and added this python code as the text body.
import csv
valid =0
invalid =0
total =0
file2 = open("invalid.csv","w")
file1 = open("valid.csv","w")
with open('/Users/himsaragallage/Desktop/redder/Regexo_2019101812750.dat.csv') as f:
r = csv.reader(f)
for row in f:
# print row[1]
total +=1
if row[59] == "0" or row[59] == "1" or row[59] == "6" or row[59] == "7":
valid +=1
file1.write(row)
else:
invalid += 1
file2.write(row)
file1.close()
file2.close()
print("Total : " + str(total))
print("Valid : " + str(valid))
print("Invalid : " + str(invalid))
I have no idea how to use a session and code within the executeScript processor as shown in this question. So I just wrote a simple python code and directed the valid and invalid data to different files. This approach I have used has many limitations.
- I want to be able to dynamically process csv's with different filenames.
- The csv which the invalid data is sent to, must also have the same filename as the input csv.
- There would be around 20 csv's in my
redderfolder. All of them must be processed in one go.
Hope you could suggest a method for me to do the following. Feel free to provide me with a solution by editing the python code I have used or even completely using a different set of processors and totally excluding the use of ExecuteScript Processer




