4
votes
bucket/
├── seoul/
│   ├── weather/
│   │   └── data.json
│   └── gdp/
│       └── data.json
├── tokyo/
│   ├── weather/
│   │   └── data.json
│   ├── gdp/
│   │   └── data.json
│   └── transit/
│       └── data.json
├── seattle/
│   ├── weather/
│   │   └── data.json
│   └── cost-of-living/
│       └── data.json
├ ....

I wanted to crawl all weather data in my bucket. As described in AWS Doc, I set my S3 Target Path as

s3://bucket/*/weather

However the glue crawler doesn't match any datas. Creates 0 tables. How should I set glue targets so that I can gather all weather data?

2

2 Answers

1
votes

Glob patterns are supported in exclusion pattern. So for your case try to set target as s3://bucket/ and add exclusions for */gdp/**,*/transit/**,*/cost-of-living/**

1
votes

If there's not much folders to exclude, @Yuriy Bondaruk has great answer. However, in my case, there are many folders to exclude and it doesn't guarantee that current file tree is fixed.

Therefore, I am going to build nested cloudFormation.

  1. BASE Cloudformation: Take city as input and run crawler.
  2. Very-Long Cloudformation template: give city name as parameter and call BASE cloudformation.