I'm trying to create StructType Schema for below JSON
{
"countries":{
"country":[
{
"area":9596960,
"cities":{
},
"name":"China",
"population":1210004992
},
{
"area":3287590,
"cities":{
},
"name":"India",
"population":952107712
},
{
"area":9372610,
"cities":{
"city":[
{
"name":"New York",
"population":7380906
},
{
"name":"Los Angeles",
"population":3553638
},
{
"name":"Chicago",
"population":2721547
},
{
"name":"Detroit",
"population":1000272
}
]
},
"name":"United States",
"population":266476272
},
{
"area":1919440,
"cities":{
"city":[
{
"name":"Jakarta",
"population":8259266
},
{
"name":"Surabaya",
"population":2483871
},
{
"name":"Bandung",
"population":2058649
},
{
"name":"Medan",
"population":1730752
},
{
"name":"Semarang",
"population":1250971
},
{
"name":"Palembang",
"population":1144279
}
]
},
"name":"Indonesia",
"population":206611600
}
]
}
}
I'm doing below code to get all countries name
DataTypes.createStructField("countries", (new StructType()).add(DataTypes.createStructField("country",
(new StructType()).add(DataTypes.createStructField("name", DataTypes.StringType, true)), true)), true)
But when am running below to get all country names
Dataset<Row> namesDF = spark.sql("SELECT countries FROM country");
namesDF.show();
I'm getting nulls
, May I know please how to parse Json fields to get values using StructType ..?
The am doing, Is it the correct way of doing it ..? I'm trying to get country names from above JSON
Update:
Code:
static final StructType SCHEMA = new StructType(new StructField[] {
DataTypes.createStructField("countries",
new StructType().add(DataTypes.createStructField("country",
new ArrayType(new StructType()
.add(DataTypes.createStructField("name", DataTypes.StringType, true)), true),
true)),
true) }
);
}
Entry Point
Dataset<Row> ds = spark.read().schema(Jsonreadystructure.SCHEMA)
.json(context.getProperty(GlobalConstants.ReadyJsonFile));
ds.printSchema();
ds.createOrReplaceTempView("country_data");
ds.sqlContext().sql("SELECT country.name FROM country_data lateral view explode(countries.country) t as country").show(false);
Output
root
|-- countries: struct (nullable = true)
| |-- country: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- name: string (nullable = true)
+----+
|name|
+----+
+----+
Why Its showing empty name..? I'm using spark 2.4.4
Schema Discovery
root
|-- countries: struct (nullable = true)
| |-- country: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- area: double (nullable = true)
| | | |-- cities: struct (nullable = true)
| | | | |-- city: string (nullable = true)
| | | |-- name: string (nullable = true)
| | | |-- population: long (nullable = true)