3
votes

I'm connecting to an external API using cfhttp, with the returned data in XML format. I have no control over the API or the format it's returned in.

When the data is returned, I loop through it and do cfquery inserts into my own MySQL database, which has a UTF8 charset.

However, some of the data appears to have unicode characters (it appears it should be the £ (pound) sign, but when I cfdump the XMLParsed data, it's showing as a diamond with a ? inside). I've attached a cropped screenshot showing part of the cfdump showing this;

enter image description here

The problem is the cfquery insert - when it gets to those characters, it's returning this error;

Error Executing Database Query.

Incorrect string value: '\xEF\xBF\xBD10 ...' for column 'voucherTitle' at row 1

I've tried setting the charset in the cfhttp call, but get the same result.

Is there any way I can either encode/decode these, or alternatively trim them out altogether (the data gets edited further down the line anyway, so manually adding the correct symbols isn't a huge issue).

1
Try including <cfprocessingdirective pageencoding = "utf-8"> on top of your ColdFusion page. - Pankaj
Same error unfortunately :-( - Lee
You could try replacing it with its character reference- &#163; - TRose
(Edit) @Lee - I recall something similar on another thread. Do your DSN settings use characterEncoding=UTF8 or is the INSERT using CF_SQL_NVARCHAR? Also, I know you said the db uses UTF-8, but be sure to verify the column charset in information_schema.columns. Side note, <cfprocessingdirective pageencoding> is only needed when the unicode characters are literally embedded in the cfml source. - Leigh
@Leigh - never in a million years would I have thought to check the column charset - I always thought it would default to whatever the DB was set to. However, that was indeed the problem - set it to UTF8 and all works a treat! EDIT - please do post that as an answer so I can accept as the answer - Lee

1 Answers

1
votes

UPDATE: As of MySQL 5.5.3, there is also UTF8mb4 which is often recommended over UTF8.


(From the comments)

I recall something similar on another thread. Double check the collation and character set for that column using the INFORMATION_SCHEMA.COLUMNS view:

 SELECT  *
 FROM    INFORMATION_SCHEMA.COLUMNS
 WHERE   TABLE_NAME = 'YourTableName'

If it is not UTF-8, you can change it using the ALTER TABLE command. Modify the column size M as needed.

 ALTER TABLE YourTableName 
    MODIFY YourColumnName VARCHAR(M) 
    CHARACTER SET utf8;

NB: If the data is important, always make a backup of the table before applying any modifications.

See also: 11.1.15 Character Sets and Collations Supported by MySQL