4
votes

Summary

I have a problem with String encoding and GAE that I cannot solve. Basically I have a classic encoding problem where special characters like ñ, é, ü, show up like ��� in both the datastore viewer and the client, but, only in production mode.

I save data to the datastore through two mechanisms: - User input : this works perfect in both dev mode and production - 3rd party API : this one works on dev mode but not in production

Following the data

  • When I request the 3rd party API the data, the response header tells me the data comes in UTF-8. If I see the response content I can read the data perfectly well

    Content-Type:text/html; charset=utf-8

  • This request is processed using gson and coverted into a java class. As far as I can see, there is no way to specify a char encoding to gson
  • Then I take the data to datastore without changing its encoding (at least in my application point)
  • First sign of problem: if I look in the production datastore, the encoding is already lost
  • On the client (GWT), I receive the data, also encoded in UTF-8 but the Strings already have �� simbols.
Dev mode VS production

I have my Eclipse configured for UTF-8. I think that is the main reason why everything works very well in development mode.

I have not been able yet to find how to set the production JVM to UTF-8 (I read here that the default is US-ASCII and it may not be possible to change that) - In dev mode, I have eclipse configured to use UTF-8 - In production mode I have followed the advices by this guy but it does not change the behavior:

Top-level appengine-web.xml:

<system-properties>
    <!-- Configure java.util.logging -->
    <property name="java.util.logging.config.file" value="WEB-INF/logging.properties" />
    <!-- UTF-8 Support -->
    <property name="file.encoding" value="UTF-8" />

</system-properties>

<!-- UTF-8 Support -->
<env-variables>
    <env-var name="DEFAULT_ENCODING" value="UTF-8" />
</env-variables>

I do not know what else to do to fix it. Does any one have a workaround for this problem?

1
It is confusing to note that it works well in dev mode and not production. The fact that the datastore viewer on App Engine is showing special characters ???? and not the correctly encoded characters points to the fact that somewhere along the way, the character encoding was lost. If the Web Service i.e. External API that you are retrieving it from has correctly encoded it and you simply set it in the datastore, things should have worked.Romin
@Romin I think what happens is that I have my Eclipse set to use UTF-8. So that is why it works OK on dev mode. But somehow the production Java VM of GAE must be using a different encoding. I need to find out how to change that...manubot

1 Answers

3
votes

Well, unfortunately based on the lack of answers here, I think there is no way to set-up UTF-8 as the default encoding on GAE's production JVM.

In the case that was haunting me above, my problem was that I was reading the 3rd party API request using the default encoding, which in production GAE is US-ASCII:

BufferedReader reader = 
    new BufferedReader(new InputStreamReader(url.openStream());

Changing the line above to

BufferedReader reader = 
    new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8");

solves the issue.