1
votes

I'm inserting an email content text as UTF-8 string using php into an SQL server 2008 database table and it is working fine except one specific email.

The INSERT command fails with this error:

An error occurred translating the query string to UTF-16: No mapping for the Unicode character exists in the target multi-byte code page.

The text that causes it is an extension text of a phone number:

enter image description here

this "xF7" that supposed to be +91-98XXXXXXX (i added the XX) must have turned into UTF-16 or something?

before insert into the database I did a UTF-8 check using mb_detect_encoding:

$HTMLencode = mb_detect_encoding(HTMLString, mb_detect_order(), true); 

$PLAINencode = mb_detect_encoding(PLAINString, mb_detect_order(), true);

as you can see I even take under consideration a "multipart email" - part of HTML and a part of PLAIN text. both checks return UTF-8 (which means the "xF7" fooled me.. :))

I also did iconv() using UTF-8//IGNORE in order to ignore invalid characters, nothing helps, how do I solve this in php?

The code above works fine for 99% of the emails except that one special one that raise this error.

1

1 Answers

1
votes

0xF7 encodes ÷ in Windows-1252. Are you just passing data directly to database?

You should use an email library that reads the email headers correctly, which state the character encoding that is being used in the email. The library would then ideally convert from that encoding to UTF-8 before handing it to you.

mb_detect_encoding is virtually useless because it just has access to the bytes and doesn't apply any heuristics either. It is especially useless if it gives UTF-8 for a string that has 0xF7, which cannot appear in UTF-8