2
votes

There's an excel file testFile.xlsx, it looks like as below:

ID  ENTITY  STATE
1   Montgomery County Muni Utility Dist No.39   TX
2   State of Washington WA
3   Waterloo CUSD 5 IL
4   Staunton CUSD 6 IL
5   Berea City SD   OH
6   City of Coshocton   OH

Now I want to import the data into the AWS GLUE database, a crawler in AWS GLUE has been created, there's nothing in the table in AWS GLUE database after running the crawler. I guess it should be the issue of classifier in AWS GLUE, but have no idea to create a proper classifier to successfully import data in the excel file to AWS GLUE database. Thanks for any answers or advice.

4

4 Answers

6
votes

I'm afraid Glue Crawlers have no classifier for MS Excel files (.xlsx or .xls). Here you can find list of supported formats and built-in classifiers. Probably, it would be better to convert files to CSV or some other supported format before exporting to AWS Glue Catalog.

2
votes

Glue crawlers doesn't support MS Excel files. If you want to create a table for the excel file you have to convert it first from excel to csv/json/parquet and then run crawler on the newly created file.

You can convert it easily using pandas. Create a normal python job and read the excel file.

import pandas as pd
df = pd.read_excel('yourFile.xlsx', 'SheetName', dtype=str, index_col=None)
df.to_csv('yourFile.csv', encoding='utf-8', index=False)

This will convert your file to csv then run crawler over this file and your table will be loaded.

Hope it helps.

0
votes

When you say that "there's nothing in the table in AWS Glue database after running the crawler" are you saying that in the Glue UI, you are clicking on Databases, then the database name, then on "Tables in xxx", and nothing is showing up?

The second part of your question seems to indicate that you are looking for Glue to import the actual data rows of your file into the Glue database. Is that correct? The Glue database does not store data rows, just the schema information about the files. You will need to use a Glue ETL job, or Athena, or hive to actually move the data from the data file into something like mySQL.

0
votes

You should write script (most likely python shell job in glue) to convert excel to csv and then run crawler over it.