Hi I am trying to split a column in spark RDD.
Data set sample:
Here I want to split the Month column to a Month and a year: Example:
2019 10
2009 11
and further count all the tweets in a year.(I know how to use reduceByKey(+) here)
How do I split columns in Spark RDD? I don't want to use Data frames.
map
function, split the string by length (year is first 4 chars, month is next two) and return a tuple of(month, year)
. – Rayan Ral