0
votes

as the title states, I'm having trouble converting a column on a Dynamic Frame from Epoch to a timestamp.

I have tried onverting in into a Data Frame and back to Dynamic Frame but it is not working.

This is what I'm working with:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame

from pyspark.sql import functions as f
from pyspark.sql import types as t
from pyspark.sql.functions import udf

from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "s3-sat-dth-prd", table_name = "s3_sat_dth_prd_vehicle", transformation_ctx = "datasource0")

applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("in", "int", "in", "int"), ("out", "int", "out", "int"), ("ts", "long", "ts", "long"), ("cam", "string", "cam", "string"), ("subclass", "string", "subclass", "string")], transformation_ctx = "applymapping1")   

selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["in", "out", "ts", "cam", "subclass"], transformation_ctx = "selectfields2")

resolvechoice3 = ResolveChoice.apply(frame = selectfields2, choice = "MATCH_CATALOG", database = "s3-sat-dth-prd", table_name = "test_split_array_into_records_json", transformation_ctx = "resolvechoice3")

datasink4 = glueContext.write_dynamic_frame.from_catalog(frame = resolvechoice3, database = "s3-sat-dth-prd", table_name = "test_split_array_into_records_json", transformation_ctx = "datasink4")
job.commit()

What I've tried was creating a Data Frame tsconvert = resolvechoice3.toDF() and turn it back into Dynamic Frame resolvechoice4 = DynamicFrame.fromDF(tsconvert, GlueContext, resolvechoice4); I get a syntax error at the last code snippet i pasted right at the end of resolvechoice4.

COuld not find if there is anything built into Glue to convert to timestamp. When Iìll make sure the data is correctly written to S3, Redshift will be my Target.

Has anybody ever done anything like this and could lead me the way?

Thanks in advance.

1

1 Answers

1
votes

AWS Glue has the SQL functions( imported via pyspark package) which allows to transform the epoch timestamps into human readable or desired date format.

Example:

from pyspark.sql.functions import from_unixtime, unix_timestamp, col

resolvechoice3 = ResolveChoice.apply(frame = selectfields2, choice = "MATCH_CATALOG", database = "s3-sat-dth-prd", table_name = "test_split_array_into_records_json", transformation_ctx = "resolvechoice3")

tsconvert = resolvechoice3.toDF()
tsconverted= tsconvert.withColumn(col(tsColumnName),from_unixtime(col(tsColumnName)))
resolvechoice4 = DynamicFrame.fromDF(tsconverted, glue_context,"transformedDF")

Based on your need, you can define the date format in similar way using date functions from pyspark.sql.functions class.