0
votes

In the content retrieved with ColdFusion http object there are some characters that are returned as question marks; namely these are roman numerals (like Ⅱ) which are displayed without problems when I visit the same page with a browser.

The server where I make request to dose not seem to provide any charset information in the response headers (the value of Content-Type is just "text/html" and charset property in the result of cfhttp is blank), but the encoding is declared in page's html as "charset=EUC-JP" (it is a page in Japanese). So I make request with charset set to EUC-JP.

The content in Japanese (Japanese characters) is retrieved correctly, but the roman numerals are turned into question marks.

I tried requesting with charset set to UTF-8, but in this case everything gets scrambled. To me it seems that those roman numerals are Unicode, so my understanding is that the server where I make request to mixes encodings (but I maybe wrong about this).

How do I get those special characters to display correctly in the fileContent of cfhttp?

Thanks!

1
Can you share the URL so we can test?Sharondio
Thanks for quick response. Unfortunately I don't think I can do that because of some privacy related considerations. Is there any information I can provide to make it clearer? I can add that I tried using those roman numerals on my own site's test page (which is returned as UTF-8), and there were no any problems with displaying.Kirill G.
@Sharondio, I am still nowhere with this. Here is a link link. On this page, there are characters such as Ⅰ or Ⅱ. Is there a way to read this link and have both Japanese characters and these Roman numerals stored properly?Kirill G.

1 Answers

0
votes

The only way I can think of is to make 2 requests with the different encodings and the merge the data together. The first request would be for charset of EUC-JP and the second would be with UTF 8. After the second request look through the content from the first and for every question mark, look up the index in the second request. For example, when you hit the 5th question mark in the first set of content, look for the 5th roman numeral in the second set. It's not guaranteed to work, but it's all I can think of.