My Rails 3.2.2 / Ruby 1.9.3 application gets search requests such as:
http://booko.com.au/books/search?q=Fran%E7ois+Vergniolle+de+Chantal
Ruby / Rails takes this query and decodes it - but assumes it's UTF-8. At some point I get a :
invalid byte sequence in UTF-8
app/models/product.rb:694:in `upcase'
I think it's doing something like this:
q="Fran%E7ois+Vergniolle+de+Chantal"
=> "Fran%E7ois+Vergniolle+de+Chantal"
CGI.unescape( q )
=> "Fran\xE7ois Vergniolle de Chantal"
CGI.unescape( q ).encoding.name
=> "UTF-8"
CGI.unescape( q ).valid_encoding?
=> false
What is the correct way of dealing with this? I'd like to transcode it to the correct encoding - but how do I determine the current encoding? What I'm currently doing, is just assuming it's LATIN1:
q.encode!("ISO-8859-1", "UTF-8", :invalid => :replace, :undef => :replace, :replace => "")
Or doing something I found on a blog somewhere:
q = q.unpack('C*').pack('U*')
What's the right way of dealing with this?
Edit The server is correctly sending "Content-Type: text/html; charset=utf-8" header to the client. The page also contains the appropriate meta tag: 'meta http-equiv="content-type" content="text/html;charset=UTF-8"'
Not sure if there's another method to tell the client which encodings to use?
# coding: UTF-8
at the top ofapp/models/product.rb
. I think it should solve that error. Will you satisfied with this solution? – ck3g0xE7
could be (and indeed is) a valid character in encodings other than Latin1. – Mladen Jablanović