write a spark dataframe or write a glue dynamic frame, which option is better in AWS Glue?

Question

In AWS Glue, I read the data from data catalog in a glue dynamic frame. Then convert the dynamic frame to spark dataframe to apply schema transformations. To write the data back to s3 I have seen developers convert the dataframe back to dynamicframe. Is there any advantage over writing a glue dynamic frame to writing a spark dataframe?

Eman Eman · Accepted Answer · 2020-06-13T10:08:05

You will find that there is functionality that is available only to dynamic frame writer class that cannot be accessed when using data frames:

Writing to a catalog table based on an s3 source as well when you want to utilize connection to JDBC sources. i.e using from_jdbc_conf
Writing to parquet using format glueparquet as a format.
Tracking processed files in the target location using bookmarks

These are some of the use-cases I can think of, but if you have a use case that requires using save modes, for example, mode('overwrite') you could use data frames. A similar approach however exists at dynamic frame but is implemented slightly different. You can take a look at [purge_s3_path][3] then write.

write a spark dataframe or write a glue dynamic frame, which option is better in AWS Glue?

1 Answers