2
votes

What is the difference between referencing a column by by just name of the column and using "$" sign infront like shown below.

df.select("name").show() and df.select($"name").show()

I read on the following page that it actually creates free column reference with no association to dataset.

https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-Column.html

what does "$" signify in this case? What does it do internally ? I tried getting info from spark page but it does not provide much info.

Any help to understand this is appreciated. Thank you for your help.

1

1 Answers

1
votes

As you mentioned in the page you provided, the dollar sign converts a column name into a Column object with the help of the class SQLContext.implicits$.

When using it inside a select method for an existing column in the dataframe (without constructing expressions), both df.select($"name") and df.select("name") are equivalent as the select method is overloaded for both cases.