0
votes

I have 12 excel files, each one with lots of data organized in 2 fields (columns): id and text.

Each excel file uses a diferent language for the text field: spanish, italian, french, english, german, arabic, japanese, rusian, korean, chinese, japanese and portuguese.

The id field is a combination of letters and numbers.

I need to import every excel into a different MySQL table, so one table per language.

I'm trying to do it the following way: - Save the excel as a CSV file - Import that CSV in phpMyAdmin

The problem is that I'm getting all sorts of problems and I can't get to import them properly, probably because of codification issues.

For example, with the Arabic one, I set everything to UTF-8 (the database table field and the CSV file), but when I do the import, I get weird characters instead of the normal arabic ones (if I manually copy them, they show fine).

Other problems I'm getting are that some texts have commas, and since the CSV file uses also commas to separate fields, in texts that are imported are truncated whenever there's a comma.

Other problems are that, when saving as CSV, the characters get messed up (like the chinese one), and I can't find an option to tell excel what encoding I want to use in the CSV file.

Is there any "protocol" or "rule" that I can follow to make sure that I do it the right way? Something that works for each different language? I'm trying to pay attention to the character encoding, but even with that I still get weird stuff.

Maybe I should try a different method instead of CSV files?

Any advice would be much appreciated.

2

2 Answers

3
votes

OK, how do I solved all my issues? FORGET ABOUT EXCEL!!!

I uploaded the excels to Googledocs spreadsheets, downloaded them as CSV, and all the characters were perfect.

Then I just imported into their corresponding fields of the tables, using a "utf_general_ci" collation, and now everything is uploaded perfectly in the database.

0
votes

One standard thing to do in a CSV is to enclose fields containing commas with double quotes. So

ABC, johnny cant't come out, can he?, newfield

becomes

ABC, "johnny cant't come out, can he?", newfield

I believe Excel does this if you choose to save as file type CSV. A problem you'll have is that CSV is ANSI-only. I think you need to use the "Unicode Text" save-as option and live with the tab delimiters or convert them to commas. The Unicode text option also quotes comma-containing values. (checked using Excel 2007)

EDIT: Add specific directions

In Excel 2007 (the specifics may be different for other versions of Excel)

Choose "Save As"

In the "Save as type:" field, select "Unicode Text"

save dialog screenshot

You'll get a Unicode file. UCS-2 Little Endian, specifically.