on my site I allow for direct text file uploads. These files are then stored on the server, and displayed on the website. I use UTF-8 on the site.
Now I run into trouble when people upload non-UTF-8
files which contain special chars, such as é
.
I've been doing some testing. Made 2 text files, both containing the same word fiancée
. One encoded UTF-8 and one encoded ISO 8859-2.
The UTF-8 one uploads fine, and shows the text correct, but the ISO 8859-2 shows as fianc�e
.
Now I've tried to detect the uploaded file content with mb_detect_encoding
, but whatever file I throw at it, it always detect UTF-8.
I noticed that I can use utf8_encode
to convert the ISO 8859-2 files to valid UTF-8, but this only works on non-UTF files. And as I currently cannot detect non-UTF files, I cannot use the utf8_encode
function, as it messes up valid UTF-8 files.
Hope this makes sense :)
So my question is, how can I detect files that are for sure not UTF-8 encoded to start with, so that I can use the utf8_encode
function on them.