Firstly, I am completely new to scala and spark Although bit famailiar with pyspark. I am working with external json file which is pretty huge and I am not allowed to convert it into dataset or dataframe. I have to perform operations on pure RDD.
So I wanted to know how can I get specific value of key. So I read my json file as sc.textFile("information.json")
Now normally in python I would do like
x = sc.textFile("information.json").map(lambda x: json.loads(x))\
.map(lambda x: (x['name'],x['roll_no'])).collect()
is there any equivalent of above code in scala (Extracting value of specific keys) in RDD without converting to dataframe or dataset.
Essentially same question as Equivalent pyspark's json.loads function for spark-shell but hoping to get more concrete and noob friendly answer. Thank you
Json data:
{"name":"ABC", "roll_no":"12", "Major":"CS"}
spark.read.json
? then you dont need to do any custom parsing - abiratsis