0
votes

Things to know:

  • I'm using Solr 4.10.2 locally with Tomcat 8
  • My settings in Netbeans IDE are: Encoding: windows-1252 | PHP: 5.3 (need to be this way)
  • I'm using xampp 1.7.7 in Windows 7 x64
  • My server.xml file from Tomcat starts with <?xml version='1.0' encoding='utf-8'?>
  • My server.xml file from Tomcat have this URIEncoding="UTF-8" in connector tag
  • My php SolrPhpClient files are coded in UTF-8 without BOM

Situation:

When I'm searching in my web application with Solr, if I search for the word Diário, the Solr Url called is:

h**p://localhost:8080/solr/select?sort=score+desc&fq=%28searchfield%3A%28di%E1rio%29+OR+isbn%3A%28di%E1rio%29%29&wt=json&json.nl=map&q=%28searchfield%3A%28di%E1rio%29+OR+isbn%3A%28di%E1rio%29+OR+titulo%3A%28di%E1rio%29+OR+autor%3A%28di%E1rio%29+OR+editoraid%3A1%5E0.00001+OR+editoraid%3A2%5E0.00001+OR+editoraid%3A133%5E0.00001+val%3A%22ord%28ano%29%22%29+AND+status%3A%28active%29&start=0&rows=10

If I use urldecode() I get:

h**p://localhost:8080/solr/select?sort=score desc&fq=(searchfield:(diário) OR isbn:(diário))&wt=json&json.nl=map&q=(searchfield:(diário) OR isbn:(diário) OR titulo:(diário) OR autor:(diário) OR editoraid:1^0.00001 OR editoraid:2^0.00001 OR editoraid:133^0.00001 val:"ord(ano)") AND status:(active)&start=0&rows=10

Problem:

The problem is, off course, with Diário word.

I have try insert directly on my browser this two querys:

The first one give me an error: HTTP Status 400 - {msg=URLDecoder: Invalid character encoding detected after position 18 of query string / form data (while parsing as UTF-8),code=400}

If I use the second one it works like a charm!

I have already saw, with mb_detect_encoding() function and I'm supposedly sent all it utf-8 encoding.

Why is SolrPhpClient using something like urlencode() but then it can't decode it?

Can anyone help with this one?

Thank you in advance.

Best regards,

Marcelo

1
Did you try adding URIEncoding="UTF-8" to Connector definition in tomcat's server.xmledigu
Hi, yes, I have something like <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8" /> Still the same error message.lmarcelocc
It seems like a client library related problem. You may want to try Solarium: github.com/basdenooijer/solariumedigu
Hi foozy, I have tried PECL solr to and the same happen, have read about Solarium but first I try the MatsLindh solution and it works. Thanks for your time/help.lmarcelocc

1 Answers

1
votes

As you say, you're using windows-1252 as the encoding, and the submitted data is in windows-1252. You'll have to convert it to UTF-8 (through iconv, for example: iconv("cp1252", "utf-8", $text)) before querying or inserting it into Solr.

The encoding of your source files won't affect the encoding of the data in your application, and unless you're working with UTF-8 when interfacing with Solr, you're going to run into issue all over the place. Convert it to UTF-8 when submitting and querying Solr, and convert it back to cp1252 when it arrives back into your app if needed.