I would like to read n csv files using pyspark. The csv have the same schema but with different columns names.
While reading those files I would like to create an additional column 'pipeline' that contains a substring of first column name.
How can I implement this?
df = spark.read.format("csv") \
.option("header", True) \
.load(path + "*.csv")
.withColumn("pipeline",