I have an AWS Glue job that reads from a data source like so:
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "dev-data", table_name = "contacts", transformation_ctx = "datasource0")
But when I call .toDF() on the dynamic frame, the headers are 'col0', 'col1', 'col2' etc. and my actual headers are in the first row of the dataframe.
Note - I can't set them manually as the columns in the data source are variable & iterating over the columns in a loop to set them results in error because you'd have to set the same dataframe variable multiple times, which glue can't handle.
How might I capture the headers while reading from the data source?
DataFrame
has generic column names, probably your catalog entry has it too. Did you use crawler to populate the Catalog? – botchniaquedatasource0.printSchema()
anddatasource0.toDF().printSchema()
but I doubt that they would not have same schema. – botchniaque