1
votes

When iam trying to detect the file type using Crawler of size >=1MB of input Json file It creates a table in glue with is classification type is "Unknown". But when the size is <1MB it successfully classifies the file type as JSON.

I crosschecked the file to ensure its a valid json file.

It is something a limitation for aws crawler.

If so is there any alternative to this issue.

1

1 Answers

2
votes

Yes, that is by design of the crawler, if the meta data ( Internally crawler creates it) exceeds 1mb you'll get the above error, Crawler crawls 1mb for files that are more than 1mb or the entire file if the file size is less than 1Mb. If the metadata itself doesn't fit 1Mb then the file will end up in Unkowntype.