5
votes

I'm using the GAE datastore for a Java application, and storing some text that will be in numerous languages. In my servlet, I'm first checking to see if there's any data in the data store, and, if not, I'm creating some, similar to the following:

ArrayList<Lang> list = new ArrayList<Lang>();
list.add(new Lang("EN", "English", 1));
list.add(new Lang("ES", "Español", 0));
//more languages here...

PersistenceManager pm = PMF.get().getPersistenceManager();
for(Lang l : list) {
  pm.makePersistent(l);
}

Since this is using JDO, I guess I should include the relevent parts of the Lang class too:

@PersistenceCapable
public class Lang {
 @PrimaryKey
 private String code;
 @Persistent
 private String name;
 @Persistent
 private int popularity;
// getters & setters & constructors...
}

However, the non-ASCII characters are giving me grief. I've set my Eclipse project to use the UTF-8 encoding instead of the default Cp1252, so I think I'm okay from that perspective, but when I use the App Engine Data Viewer to look at my data, that Español entry becomes Español, and when I click on it to view it, I get a 500 Server Error. (There are some other entries with right-to-left text that don't even show up in the Data Viewer at all, but one problem at a time...)

Is there anything special I can do in my code to set the character encoding, or specify to GAE that the data I'm storing is UTF-8? Or is the problem on the Eclipse side, and is there something I should be doing with my Java code?

4
Not that it's a "solution," per se, but if I insert the data manually, using the Data Viewer, it gets inserted fine, and my servlet which pulls data from the datastore and returns it also returns the data fine. This may become my solution; I only need the initial load of data to be done, and after that it won't get updated, so doing it manually is an option. Although I'd prefer to know why the coding solution wasn't working...sernaferna

4 Answers

1
votes

Fixed same issue by setting both request and response encoding to utf-8. Request encoding results in valid string stored in datastore, without it values will be stored as "????..."

Requests: if you use Apache HTTP Client, this is done in the following way:

Get request:

NameValuePair... params;
...
String url = urlBase + URLEncodedUtils.format(Arrays.asList(params), "UTF-8");
HttpGet httpGet = new HttpGet(url);

Post request:

NameValuePair... params;
...
HttpPost httpPost = new HttpPost(url);
httpPost.setEntity(new UrlEncodedFormEntity(Arrays.asList(params), "UTF-8"));

Response: if you build your response in HttpServlet, this is done in a following way:

HttpServletResponse resp;
...
resp.setContentType("text/html; charset=utf-8");
0
votes

Are you sure you have a problem with your data? I also encountered the similar issues before but it turns out it's a problem in the Python version of the Data Viewer. I can retrieve my data fine in Java.

0
votes

I had I think the same problem with encoding several month ago. You can take a look to my sources, maybe it'll help: 1) http://code.google.com/p/vocrecaptor/source/browse/trunk/vocrecaptorweb/src/com/vocrecaptor/web/server/DictionaryServiceImpl.java

2) And class /com/vocrecaptor/web/server/servlet/AbstractServiceServlet.java

0
votes

i notice that you already set your Eclipse project to use UTF-8 text encoding. Did you double checked the text enconding of the Java file containing the string like "Español" ?