I'm doing a small project for university using Apache NiFi and Apache Spark. I want to create a workflow with NiFi that reads TSV files from HDFS and using Spark Streaming I can process the files and store the information I need in MySQL. I've already created my Workflow in NiFi and the storage part is already working. The problem is that i can't parse the NiFi package so i can use them.
The files contain rows like this:
linea1File1 TheReceptionist 653 Entertainment 424 13021 4.34 1305 744 DjdA-5oKYFQ NxTDlnOuybo c-8VuICzXtU
Where each space is a tab ("\t")
This is my code in Spark using Scala:
val ssc = new StreamingContext(config, Seconds(10))
val packet = ssc.receiverStream(new NiFiReceiver(conf, StorageLevel.MEMORY_ONLY))
val file = packet.map(dataPacket => new String(dataPacket.getContent, StandardCharsets.UTF_8))
Until here I can obtain my entire file (7000+ rows) in a single string... unfortunately i can't split that string in rows. I need to get the entire file in rows, so I can parse that in an object, apply some operations on it and store what I want
Anyone can help me with this?