0
votes

There are probably at least 10 question very similar to this, but I still have not found a clear answer.

How can I add a nullable string column to a DataFrame using scala? I was able to add a column with null values, but the DataType shows null

val testDF = myDF.withColumn("newcolumn", when(col("UID") =!= "not", null).otherwise(null))

However, the schema shows

root
 |-- UID: string (nullable = true)
 |-- IsPartnerInd: string (nullable = true)
 |-- newcolumn: null (nullable = true)

I want the new column to be string |-- newcolumn: string (nullable = true)

Please don't mark as duplicate, unless it's really the same question and in scala.

2
Try myDF.withColumn("newcolumn", lit(null).cast("string")). - Leo C

2 Answers

1
votes

Just explicitly cast null literal to StringType.

scala> val testDF = myDF.withColumn("newcolumn", when(col("UID") =!= "not", lit(null).cast(StringType)).otherwise(lit(null).cast(StringType)))

scala> testDF.printSchema

root
 |-- UID: string (nullable = true)
 |-- newcolumn: string (nullable = true)
1
votes

Why do you want a column which is always null? There are several ways, I would prefer the solution with typedLit:

myDF.withColumn("newcolumn", typedLit[String](null))

or for older Spark versions:

myDF.withColumn("newcolumn",lit(null).cast(StringType))