I'm trying to write some test cases using json files for dataframes (whereas production would be parquet). I'm using spark-testing-base framework and I'm running into a snag when asserting data frames equal each other due to schema mismatches where the json schema always has nullable = true.
I'd like to be able to apply a schema with nullable = false to the json read.
I've written a small test case:
import com.holdenkarau.spark.testing.DataFrameSuiteBase
import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
import org.scalatest.FunSuite
class TestJSON extends FunSuite with DataFrameSuiteBase {
val expectedSchema = StructType(
List(StructField("a", IntegerType, nullable = false),
StructField("b", IntegerType, nullable = true))
)
test("testJSON") {
val readJson =
spark.read.schema(expectedSchema).json("src/test/resources/test.json")
assert(readJson.schema == expectedSchema)
}
}
And have a small test.json file of:
{"a": 1, "b": 2}
{"a": 1}
This returns an assertion failure of
StructType(StructField(a,IntegerType,true), StructField(b,IntegerType,true)) did not equal StructType(StructField(a,IntegerType,false), StructField(b,IntegerType,true)) ScalaTestFailureLocation: TestJSON$$anonfun$1 at (TestJSON.scala:15) Expected :StructType(StructField(a,IntegerType,false), StructField(b,IntegerType,true)) Actual
:StructType(StructField(a,IntegerType,true), StructField(b,IntegerType,true))
Am I applying the schema the correct way? I'm using spark 2.2, scala 2.11.8