How to process pyspark dataframe columns

Question

I have a pyspark df with >4k columns without any labels/headers. Based on the column values I need apply specific operations on each columns.

I did the same using pandas but I don't want to use pandas and would like to apply the column wise transformation directly on spark dataframe. any idea as how can i apply column wise transformation if the df is having >4k columns without any label.also I don't want to apply transformations on specific df column index.

Elliott Addi Elliott Addi · Accepted Answer · 2017-02-08T08:50:27

According to the Spark documentation, a dataframe contains - unlike what you said - headers, much like a database table.

In any case, a simple for loop should do the trick:

for column in spark_dataframe.columns:
    (do whatever you want to do with your columns)

How to process pyspark dataframe columns

1 Answers