2
votes

I am using Spark 1.6 and I want to add a column to a dataframe. The new column actually is a constant sequence: Seq("-0", "-1", "-2", "-3")

Here is my original dataframe:

scala> df.printSchema()

root
|-- user_name: string (nullable = true)
|-- test_name: string (nullable = true)

df.show()

|user_name| test_name|

+------------+--------------------+

|user1| SAT|

| user9| GRE|

| user7|MCAT|

I want to add this extra column (attempt) so that the new dataframe becomes:

|user_name|test_name|attempt|
+------------+--------------------+
|user1| SAT|Seq("-0","-1","-2","-3")|
| user9| GRE|Seq("-0","-1","-2","-3")
| user7|MCAT|Seq("-0","-1","-2","-3")

How do I do that?

2
By Seq("0", "-1", "-2", "-3") you mean ["0", "-1", "-2", "-3"] ? - himanshuIIITian

2 Answers

2
votes

you can use the withColumn function:

 import org.apache.spark.sql.functions._
 df.withColumn("attempt", lit(Array("-0","-1","-2","-3")))
1
votes

You can add using the typedLit(Spark version > 2.2).

import org.apache.spark.sql.functions.typedLit
df.withColumn("attempt", typedLit(Seq("-0", "-1", "-2", "-3")))