I store into my MongoDB collection a huge list of JSON strings. For simplicity, I have extracted a sample document into the text file businessResource.json
:
{
"data" : {
"someBusinessData" : {
"capacity" : {
"fuelCapacity" : NumberLong(282)
},
"someField" : NumberLong(16),
"anotherField" : {
"isImportant" : true,
"lastDateAndTime" : "2008-01-01T11:11",
"specialFlag" : "YMA"
},
...
}
My problem: how can I convert the "someBusinessData" into a JSON object using Spark/Scala?
If I do that (for example using json4s or lift-json), I hope I can perform basic operations on them, for example checking them for equality.
Have in mind that this is a rather large JSON object. Creating a case class is not worth it in my case since the only operation I will perform will be some filtering on two fields, comparing documents for equality, and then I will export them again.
This is how I fetch the data:
val df: DataFrame = (someSparkSession).sqlContext.read.json("src/test/resources/businessResource.json")
val myData: DataFrame = df.select("data.someBusinessData")
myData.printSchema
The schema shows:
root
|-- someBusinessData: struct (nullable = true)
| |-- capacity: struct (nullable = true)
Since "someBusinessData" is a structure, I cannot get it as String. When I try to print using
myData.first.getStruct(0)
, I get a string that contains the values but not the keys: [[[282],16,[true,2008-01-01T11:11,YMA]
Thanks for your help!