1
votes

In AWS Glue, I read the data from data catalog in a glue dynamic frame. Then convert the dynamic frame to spark dataframe to apply schema transformations. To write the data back to s3 I have seen developers convert the dataframe back to dynamicframe. Is there any advantage over writing a glue dynamic frame to writing a spark dataframe?

1

1 Answers

5
votes

You will find that there is functionality that is available only to dynamic frame writer class that cannot be accessed when using data frames:

  1. Writing to a catalog table based on an s3 source as well when you want to utilize connection to JDBC sources. i.e using from_jdbc_conf
  2. Writing to parquet using format glueparquet as a format.
  3. Tracking processed files in the target location using bookmarks

These are some of the use-cases I can think of, but if you have a use case that requires using save modes, for example, mode('overwrite') you could use data frames. A similar approach however exists at dynamic frame but is implemented slightly different. You can take a look at [purge_s3_path][3] then write.