I have a number of items in an s3 path that I'm trying to crawl (using a root path of s3://my-bucket/somedata/)
s3://my-bucket/somedata/20180101/data1/stuff.txt.gz
s3://my-bucket/somedata/20180101/data2/stuff.txt.gz
s3://my-bucket/somedata/20180101/data1.sql
s3://my-bucket/somedata/20180101/data2.sql
s3://my-bucket/somedata/20180102/data1/stuff.txt.gz
s3://my-bucket/somedata/20180102/data2/stuff.txt.gz
...
Sometimes we tables are named according to the date pattern (e.g. 20180101); sometimes they are named according to the leaf level 'folder' (e.g. data1), sometimes the file (e.g. data1.sql), and when there are conflicts it seems that Glue just appends a unique identifier to the table name (e.g. data1_c17b2f988649f2171b24b1d35da7f2b4).
What is the logic here? Are these names deterministic? Are there patterns I should use for structuring my data so that the crawler will catalog things in some logical order?