1
votes

I am planning to use AWS Glue for my ETL process, and have custom python code written and run as a AWS Glue Job.

I found in AWS Glue documentation, that by default, AWS Glue allocates 10 DPU per job.Is there a maximum limit of DPUs for a job, (I do not see anything in the LIMITs section, i.e., Max of DPUs per Job limits).

Or is there any optimal data size in MB / GB, that is recommended to avoid any Out of memory error issue. Please clarify.

Thanks.

2

2 Answers

2
votes

According to the Glue API docs, the max you can allocate per Job execution is 100 DPUs.

AllocatedCapacity – Number (integer). The number of AWS Glue data processing units (DPUs) allocated to runs of this job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

1
votes

The limits aren't the same for Python Glue jobs (which the OP plans to implement) where you can have maximum 1 DPU. Below is the official documentation (as of Aug 2019)

The maximum number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

You can set the value to 0.0625 or 1. The default is 0.0625.