Hope this helps!
import json
#sample data - convert JSON to dataframe
js = {"attr1" : "attrValue1",
"attr2" : "attrValue2",
"properties": {
"prop1" : "propValue1",
"prop2" : "propValue2"
}
}
df = sqlContext.read.json(sc.parallelize([json.dumps(js)]))
df.show()
#convert above dataframe to desired format
#wide format
df = df.select("*", "properties.*").drop("properties")
df.show()
#long format
df = df.selectExpr("attr1", "attr2", "stack(2, 'prop1', prop1, 'prop2', prop2) as (propertyKey, propertyValue)")
df.show()
Sample data:
+----------+----------+--------------------+
| attr1| attr2| properties|
+----------+----------+--------------------+
|attrValue1|attrValue2|[propValue1,propV...|
+----------+----------+--------------------+
Data in wide format:
+----------+----------+----------+----------+
| attr1| attr2| prop1| prop2|
+----------+----------+----------+----------+
|attrValue1|attrValue2|propValue1|propValue2|
+----------+----------+----------+----------+
Output data (in long format):
+----------+----------+-----------+-------------+
| attr1| attr2|propertyKey|propertyValue|
+----------+----------+-----------+-------------+
|attrValue1|attrValue2| prop1| propValue1|
|attrValue1|attrValue2| prop2| propValue2|
+----------+----------+-----------+-------------+