8
votes

In the docs it's said that AWS allocates by default 10 DPUs per ETL job and 5 DPUs per development endpoint by default, even though both can have a minimum of 2 DPUs configured.

It's also mentioned that Crawling is also priced on second increments and with a 10 minute minimum run, but nowhere is specified how many DPUs are allocated. Jobs and Development Endpoints can be configured in the Glue console to consume less DPUs, but I haven't seen any such configuration for the crawlers.

Is there a fixed amount of DPUs per crawler? Can we control that amount?

2
I'm afraid it's impossible for now to control Crawler's DPUAlexey Bakulin
Yes, Alexey is correct, as for now, its not possible to modify the Crawler DPUs or view the DPU details for crawlers.Yuva
Any update on this? Very interest in how to track costs, especially since tagging is not supported with Glue service.openwonk
I am getting a minimum cost for a very quick crawler of £0.01782, which is about $0.023. Based on a 10 minute minimum run, and a hourly cost of $0.44 per DPU, I worked out I am using 0.023*6/0.44 = 0.3 of a DPU during the run. This is approx 1.25 vCPU and 5GB memory.Burrito Dan

2 Answers

2
votes

Discussed with AWS support team as well, and currently its not possible to modify or view the DPU configuration details for Glue - crawlers. But, does crawlers use a DPU?

2
votes

This is my conversation with AWS Support about this subject:

Hello, I'd like to know how many DPUs a crawler uses in order to calculate my costs with crawlers.

Their answer:

Dear AWS Customer,

Thank you for reaching out today. My name is Safari, I will assist with your case.

I understand that while compiling the cost of your Glue crawlers, you'd like to know the amount of DPUs a particular crawler uses.

Unfortunately, there is no direct way to find out the DPU consumption by a given crawler. I apologize for the inconvenience. However, you may see the total DPU consumption across all crawlers in your detailed bill under the section AWS Service Charges > Glue > {region} > AWS Glue CrawlerRun. Additionally, you can add tags to your crawlers and then enable "Cost Allocation Tags" from your AWS Billing and Cost Management console. This would allow AWS to generate a cost allocation report grouped by the predefined tags. For more on this, please see the documentation link below [1].

I hope this helps. Please let me know if I can provide you with any other assistance.

References [1]: https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html