2
votes

I am running a GWT application on Google App Engine which passes text input from the GUI via GWT-RPC/Servlet to an API. But umlauts like ä,ö,ü are misinterpreted by the API and the API shows only a ? instead of an umlaut.

I am pretty sure that the problem is the default character encoding on the Google App Engine, which is US-ASCII: US-ASCII does not know any umlaut.

Using umlauts with the API from JUnit-Tests on my local machine works. The default character encoding there is UTF-8.

The problem does not come from GWT or the Encoding with any HTML file; I used a Constant Java String within the appliation containing some umlauts and passed it to the API: the problem appears if the application is deployed in the Google App Engine.

Is there any way to change the Character Encoding in the Google App Engine? Or does anyone know another solution to my problem?

Storing umlauts from the GUI in the GAE Datastore and bringing them back to the GUI works funnily enough.

3
It's not clear what you mean by "API shows only a ?" - where? You should ideally trace the exact data at every stage to work out exactly where the problem is. See codeblog.jonskeet.uk/2014/01/20/…Jon Skeet
Sounds like you need to control the character encoding of your output. See stackoverflow.com/questions/11907764/… for more details.Daniel Tung
@DanielTung I tried to set both <system-properties><property name="file.encoding" value="UTF-8" /><property name="DEFAULT_ENCODING" value="UTF-8" /></system-properties> and <env-variables><env-var name="DEFAULT_ENCODING" value="UTF-8" /></env-variables> but it did not work.André Janus
@JonSkeet In short: Running API.call("test äöü") will cause the API (it sends SMS) to send "test äöü" from my machine and "test ???" from GAE.André Janus
@AndréJanus: That doesn't really help us work out where the problem is. What API is it? Do you have documentation?Jon Skeet

3 Answers

1
votes

I was having the same problem: the default charset of a web application deployed to Google App Engine was set to US-ASCII, but I needed it to be UTF-8.

After a bit of head scratching, I found that adding:

<system-properties>
    <property name="appengine.file.encoding" value="UTF-8" />
</system-properties>

to appengine-web.xml correctly sets the charset to UTF-8. More details can be found on Google Issue Tracker - Setting of default encoding.

0
votes

Workaround (safe)

I wrote this class to encode UTF-Strings to ASCII-Strings (replacing all chars which are not in the ASCII-table with their table-number, preceded and followed by a mark), using AsciiEncoder.encode(yourUtfString)

The String can then be decoded back to UTF with AsciiEncoder.decode(yourAsciiEncodedUtfString) where UTF is supported.

package <your_package>;

import java.util.ArrayList;

/**
 * Created by Micha F. aka Peracutor.
 * 04.06.2017
 */

public class AsciiEncoder {

    public static final char MARK = '%'; //use whatever ASCII-char you like (should be occurring not often in regular text)

    public static String encode(String s) {
        StringBuilder result = new StringBuilder(s.length() + 4 * 10); //buffer for 10 special characters (4 additional chars for every special char that gets replaced)
        for (char c : s.toCharArray()) {
            if ((int) c > 127 || c == MARK) {
                result.append(MARK).append((int) c).append(MARK);
            } else {
                result.append(c);
            }
        }
        return result.toString();
    }

    public static String decode(String s) {
        int lastMark = -1;
        ArrayList<Character> chars = new ArrayList<>();
        try {
            //noinspection InfiniteLoopStatement
            while (true) {
                String charString = s.substring(lastMark = s.indexOf(MARK, lastMark + 1) + 1, lastMark = s.indexOf(MARK, lastMark));
                char c = (char) Integer.parseInt(charString);
                chars.add(c);
            }
        } catch (IndexOutOfBoundsException | NumberFormatException ignored) {}

        for (char c : chars) {
            s = s.replace("" + MARK + ((int) c) + MARK, String.valueOf(c));
        }
        return s;
    }
}

Hope this helps someone.

0
votes

If you (like myself) are using the Java flexible environment on Google AppEngine, the default encoding can "simply" be fixed by setting the file.encoding system property through your app.yaml (via an environment variable that is automatically picked up by the runtime) like this:

env_variables:
  JAVA_USER_OPTS: -Dfile.encoding=UTF-8