What does an AWS Glue Crawler do

Question

I've read the AWS glue docs re: the crawlers here: https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html but I'm still on unclear on what exactly the Glue crawler does. Does a Crawler go through your S3 buckets, and create pointers to those buckets?

When the docs say "The output of the crawler consists of one or more metadata tables that are defined in your Data Catalog" what is the purpose of these metadata tables?

Carlos Andres Zambrano Barrera Carlos Andres Zambrano Barrera · Accepted Answer · 2018-12-04T13:58:03

The CRAWLER creates the metadata that allows GLUE and services such as ATHENA to view the S3 information as a database with tables. That is, it allows you to create the Glue Catalog.

This way you can see the information that s3 has as a database composed of several tables.

For example if you want to create a crawler you must specify the following fields:

Database --> Name of database Service role service-role/AWSGlueServiceRole Selected classifiers --> Specify Classifier Include path --> S3 location

What does an AWS Glue Crawler do

2 Answers