1
votes

I'm unable to get the default crawler classifier, nor a custom classifier to work against many of my CSV files. The classification is listed as 'UNKNOWN'. I've tried re-running existing classifiers, as well as creating new ones. Is anyone aware of a specific configuration for a custom classifier for CSV files that works for files of any size?

I'm also unable to find any errors specific to this issue in the logs.

Although I have seen reference to issues for JSON files over 1MB in size, I can't find anything detailing this same issue for CSV files, nor a solution to the problem.

1
can you provide any details what your custom classifier is configured with and a sample of your data? how big are the files? why do you need a classifier?JD D

1 Answers

2
votes

Default CSV classifiers supported by Glue Crawler:

CSV - Checks for the following delimiters: comma (,), pipe (|), tab (\t), semicolon (;), and Ctrl-A (\u0001). Ctrl-A is the Unicode control character for Start Of Heading.

If you have any other delimiter, then it will not work with default CSV classfier. In that case you will have to write grok pattern.