4
votes

The documentation on toDF() method specifies that we can pass an options parameter to this method. But it does not specify what those options can be (https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html). Does anyone know if there is further documentation on this? I am specifically interested in passing in a schema when creating a DataFrame from DynamicFrame.

1

1 Answers

1
votes

Unfortunately there's not much documentation available, yet R&D and analysis of source code for dynamicframe suggests the following:

  • options available in toDF have more to do with ResolveOption class then toDF itself, as ResolveOption class adds meaning to the parameters (please read the code).
  • ResolveOption class takes in ChoiceType as a parameter.
  • The options examples available in documentation are similar to the specs available in ResolveChoice that also mention ChoiceType.
  • Options are further converted to sequence and referenced to toDF function from _jdf here.

My understanding after seeing the specs, toDF implementation of dynamicFrame and toDF from spark is that we can't pass schema when creating a DataFrame from DynamicFrame, but only minor column manipulations are possible.

Saying this, a possible approach is to obtain a dataframe from dynamic frame and then manipulate it to change its schema.