Having a very weird problem with Glue. Using it to run some ETL on data I'm moving from MySQL RDS to Redshift. Using the same code I used on another table, where it worked fine and copied all the data as it should have.
However on the second table, for some reason it doesn't copy the data in the id column from MySQL. The id column on Redshift is completely blank.
query_df = spark.read.format("jdbc").option("url",
args['RDSURL']).option("driver",
args['RDSDRIVER']).option("dbtable",
args['RDSQUERY']).option("user", args['RDSUSER']).option("password",
args['RDSPASS']).load()
datasource0 = DynamicFrame.fromDF(query_df, glueContext,
"datasource0")
logging.info(datasource0.show())
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings =
[("id", "int", "id", "int"), ... , transformation_ctx =
"applymapping1")
logging.info(applymapping1.show())
From the above logs I print above I can see that the Dynamic Frame contains the id field even after ApplyMapping.
datasink2 = glueContext.write_dynamic_frame.from_jdbc_conf(frame =
applymapping1, catalog_connection = args['RSCLUSTER'],
connection_options = {"dbtable": args['RSTABLE'], "database":
args['RSDB']}, redshift_tmp_dir = args["TempDir"],
transformation_ctx = "datasink2")
The problem seems to happening here I think? After this the job completes, on checking Redshift the id column is completely empty.
Very puzzled by this behaviour. The exact code worked fine on another table, the only differences between the id in these two tables is that this table has id as int (11) unsigned while the table where the code worked had id as int (10) signed.