1
votes

I have a bunch of text files that were written to a Linux server, that I need to pull into the database. I'm using file_get_contents() to grab the contents of the files. The text files have a lot of special characters in them (things like this:àáâãäåæçèéêëìíîïòóôõöøùúûü) , and they just aren't going into the database correctly (this is specifically going into a Wordpress site).

Things I've checked or tried:

  • the database that I'm putting the data into is utf8_general_ci
  • I've used mb_detect_encoding() to see what the text files are; it thinks they're ISO-8859-1
  • I used file -bi to check the charset in SSH; it thinks that they're plaintext/no-charset
  • I've tried utf8_encode()
  • I've tried mb_convert_encoding()
  • I've tried iconv()
  • I've tried htmlentities()
  • confirmed that the meta tags on the WP site are set to output utf-8
  • the server's character encoding (set in php.ini) is set to ISO-8859-1

Depending on what I try, I get either the A with the little squiggly over it, or more commonly, a rectangle with what looks like the numbers 00 86 in it.

I'm stumped -- if anyone has any other suggestions, I'm all ears!

2

2 Answers

1
votes

Make sure (in order of importance):

  1. Your data is UTF-8 encoded (this includes your database, if applicable).
  2. Your server is sending utf-8 headers.
  3. Your HTML has utf-8 meta tags.

numbers 1 and 2 are the most common problems. (Number 2 especially - if your server sends headers specifying a different encoding, the browser will try to use that encoding, even if the meta tag says "utf-8".)

0
votes

Try mysql_set_encoding('utf-8'); For all database connections.