In AWS Glue, I read the data from data catalog in a glue dynamic frame. Then convert the dynamic frame to spark dataframe to apply schema transformations. To write the data back to s3 I have seen developers convert the dataframe back to dynamicframe. Is there any advantage over writing a glue dynamic frame to writing a spark dataframe?
1 Answers
5
votes
You will find that there is functionality that is available only to dynamic frame writer class that cannot be accessed when using data frames:
- Writing to a catalog table based on an s3 source as well when you want to utilize connection to JDBC sources. i.e using
from_jdbc_conf
- Writing to parquet using format
glueparquet
as a format. - Tracking processed files in the target location using bookmarks
These are some of the use-cases I can think of, but if you have a use case that requires using save modes, for example, mode('overwrite')
you could use data frames. A similar approach however exists at dynamic frame but is implemented slightly different. You can take a look at [purge_s3_path][3]
then write.