24
votes

I have a Dataframe with one column. Each row of that column has an Array of String values:

Values in my Spark 2.2 Dataframe

["123", "abc", "2017", "ABC"]
["456", "def", "2001", "ABC"]
["789", "ghi", "2017", "DEF"]

org.apache.spark.sql.DataFrame = [col: array]

root
|-- col: array (nullable = true)
|    |-- element: string (containsNull = true)

What is the best way to access elements in the array? For example, I would like extract distinct values in the fourth element for the year 2017 (answer "ABC", "DEF").

4

4 Answers

30
votes

Since Spark 2.4.0, there is a new function element_at($array_column, $index).

See Spark docs

18
votes
9
votes

What is the best way to access elements in the array?

Accessing elements in an array column is by getItem operator.

getItem(key: Any): Column An expression that gets an item at position ordinal out of an array, or gets a value by key key in a MapType.

You could also use (ordinal) to access an element at ordinal position.

val ds = Seq(
  Array("123", "abc", "2017", "ABC"),
  Array("456", "def", "2001", "ABC"),
  Array("789", "ghi", "2017", "DEF")).toDF("col")
scala> ds.printSchema
root
 |-- col: array (nullable = true)
 |    |-- element: string (containsNull = true)
scala> ds.select($"col"(2)).show
+------+
|col[2]|
+------+
|  2017|
|  2001|
|  2017|
+------+

It's just a matter of personal choice and taste which approach suits you better, i.e. getItem or simply (ordinal).

And in your case where / filter followed by select with distinct give the proper answer (as @Will did).

1
votes

you can do something like below

import org.apache.spark.sql.functions._

val ds = Seq(
 Array("123", "abc", "2017", "ABC"),
 Array("456", "def", "2001", "ABC"),
 Array("789", "ghi", "2017", "DEF")).toDF("col")

ds.withColumn("col1",element_at('col,1))
.withColumn("col2",element_at('col,2))
.withColumn("col3",element_at('col,3))
.withColumn("col4",element_at('col,4))
.drop('col)
.show()

+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
| 123| abc|2017| ABC|
| 456| def|2001| ABC|
| 789| ghi|2017| DEF|
+----+----+----+----+