AWS Glue Catalog unable to detect parquet files, creates root path as a single table instead

Question

I've a list of 500+ tables stored in AWS S3 in parquet format. The structure is as follows:

aws-bucket/
└── parquet/
    └── table1/t1.parquet
    └── table2/t2.parquet
    └── table3/t3.parquet
    └── table4/t4.parquet
    └── table5/t5.parquet
    └── table6/t6.parquet
    └── table7/t7.parquet
    └── table8/t8.parquet

When I run a Glue Crawler on "s3://aws-bucket/parquet/", and try to create an Athena DB, it only creates a table called parquet, instead of creating all 500+ tables. I haven't tried with any customization on the crawler parameters.

Please help.

Sandeep Fatangare Sandeep Fatangare · Accepted Answer · 2019-08-20T16:47:45

Check https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html?icmpid=docs_glue_console#crawler-grouping-policy

Grouping behavior for S3 data (optional)

Create a single schema for each S3 path

By default, when a crawler defines tables for data stored in S3, it considers both data compatibility and schema similarity. Select this check box to group compatible schemas into a single table definition across all S3 objects under the provided include path. Other criteria will still be considered to determine proper grouping.

Check this option in glue crawler console. It will create 500+ tables

AWS Glue Catalog unable to detect parquet files, creates root path as a single table instead

1 Answers