I would assume that your file looks like this
key1.value1
key2.value2
And you want to print and save either values or pairs in some other format.
If you want to print and save just values you can transform splitRDD
into just values RDD.
val valRDD = splitRDD.map( _( 1 ) )
valRDD.foreach( println )
Note that saveAsTextFile
doesn't save the file in easy to use format so you'll probably need a simple text writer (Java PrintWriter
will do just fine).
Example to print and save splitRDD
in two different formats
import org.apache.spark._
import java.io.{ PrintWriter, File, FileOutputStream }
...
val pwText = new PrintWriter(
new File( "emailMsgValues.txt" )
)
val pwCSV = new PrintWriter(
new File( "emailMsgPair.csv" )
)
val emailMsg = sc.textFile( "data/emailMsg.txt" )
val splitRDD = emailMsg.map( line => line.split( '.' ) )
println( "Printing and writing values in text" )
val valRDD = splitRDD.map( _( 1 ) ).collect()
valRDD.foreach( value => {
println( value )
pwText.write( value + "\n" )
} )
println( "Printing and writing pairs in csv" )
splitRDD.collect().foreach( pair => {
println( pair.mkString( "," ) )
pwCSV.write( pair.mkString( "," ) + "\n" )
} )
pwText.close()
pwCSV.close()
split
method outputs an array, whosetoString
method will not actually print any member of the array itself. If you only want to print only the second item (for example) you should do something like:splitRDD.foreach(row => row(1))
. – stefanobaghinoline => line.split(".")
? Can you give sample file input and your expected print output? – Gsquare