I have an issue where I need to dynamically update columns in a Spark dataframe.
Basically I need to loop through the column list and if the column exists already in the list, rename it to that column plus its index.
My attempted code was something like this:
def dup_cols(df):
for i, icol in enumerate(df.columns):
for x, xcol in enumerate(df.columns):
if icol == xcol and i != x:
df = df.withColumnsRenamed(xcol, xcol + '_' + str(x))
return df
But this renames by name (here as xcol), thus not solving my issue.
Can I change this to rename the column in the dataframe by its index? I have searched around for quite a while and found nothing.
I also cannot convert to a Pandas dataframe, so I would need a Spark/PySpark solution to renaming a specific column by its index only.
Thank you!