I have defined/setup a crawler to read and catalog avro files that are in an S3 bucket. But the crawler/classifier could not read the "doc" property for a field, so it creates a schema in the catalog with field names and the corresponding data types but without doc field value. I am exploring the option of creating a custom classifier that would read and populate the doc property for a field along with it's name and type. I went through AWS official docs but did not find any information/examples on how to do it. Thanks.
0
votes
1 Answers
0
votes
Hi you may want to check it here: https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html
You can provide a custom classifier to classify your data in AWS Glue. You can create a custom classifier using a grok pattern, an XML tag, JavaScript Object Notation (JSON), or comma-separated values (CSV). An AWS Glue crawler calls a custom classifier. If the classifier recognizes the data, it returns the classification and schema of the data to the crawler. You might need to define a custom classifier if your data doesn't match any built-in classifiers, or if you want to customize the tables that are created by the crawler.