0
votes

I am having a parquet file with the below structure

column_name_old
First
Second

I am crawling this file to a table in AWS glue, however in the schema table I want the table structure as below without changing anything in parquet files

column_name_new
First
Second

I tried updating table structure using boto3

col_list = js['Table']['StorageDescriptor']['Columns']
for x in col_list:
    if isinstance(x, dict):
        x.update({'Name': x['Name'].replace('column_name_old', 'column_name_new')})

And it works as I can see the table structure updated in Glue catalog, but when I query the table using the new column name I don't get any data as it seems the mapping between the table structure and partition files is lost.

Is this approach even possible or I must change the parquet files itself? If it's possible what I am doing wrong?

1

1 Answers

0
votes

You can create a view of the column name mapped to other value. I believe a change in the column name will break the meta catalogue.