2
votes

I have created a data lake with AWS Lake Formation and an AWS Glue Crawler to create a catalog from DynamoDB table (size: 130 GB, ItemCount: 739,013,546). It's been 12hrs since I started the crawler run but it still shows Starting as its Status.

Is it normal for it to take this much time?

PS: The role assigned to the crawler has permission to scan the DynamoDB table I want.

EDIT:

The only log event in CloudWatch is

{
    "events": [
        {
            "timestamp": 1582560218096,
            "message": "[6a56a417-0617-4253-a6be-091cc367328b] BENCHMARK : Running Start Crawl for Crawler dynamodb-crawler",
            "ingestionTime": 1582560344705
        }
    ]
}
2

2 Answers

0
votes

It is strange that I took so much time. Are the crawler logs in cloud watch spitting anything out

0
votes

This might be a different issue, but it may just be taking a long time to scan if your table is very large.

I had the same problem trying to crawl an on-premise Oracle database. I stopped it after an hour with no logs other than the starting log:

BENCHMARK : Running Start Crawl for Crawler

Then all the logs came through with timestamps ranging from when the crawl started to when I stopped it. I am not sure why they weren't showing up before, or why the crawler was still in the Starting status, but in my instance it actually was running.