0
votes

I am trying to read selected columns while reading the csv file. Suppose csv file has 10 columns but I want to read only 5 columns. Is there any way to do this?

Pandas we can use usecols but is there any option available in pyspark also?

Pandas :

df=pd.read_csv(file_path,usecols=[1,2],index_col=0)

Pyspark :

?
1
Does this answer your question? How to read specific column in pyspark?blackbishop
But how to read directly?Shivika Patel

1 Answers

0
votes

If you want to read the first 5 columns, you can select the first 5 columns after reading the whole CSV file:

df = spark.read.csv(file_path, header=True)
df2 = df.select(df.columns[:5])