I'm building a gwt app that stores the text of random webpages in a datastore text field. Often the text is formatted UTF-8. All the files of my app are stored as UTF-8 and when I run the application on my local machine the entire process works fine. UTF-8 text is stored as such and retrievable ftom the local version of the app engine as UTF-8. However when I deploy the app to the google app engine somewhere between when I store the text and when I retrieve it it is no longer UTF-8 which causes non-ascii characters to be displayed as ?.
When I view the datastore in the appengine control panel all the special characters appear as ? which leads me to believe that it is a problem when writing to the database.
Does anyone know how to fix this?
The app itself is a little big. Here's some pseudocode:
Text webPageText = new Text(<STRING THAT CONTAINS UNICODE CHARACTERS>);
/*Some Code to store Text object on datastore
Specifically I'm using javax.jdo.PersistenceManager to do this.
Some Code to retrieve text from datastore. */
String retrievedText = webPageText.getValue();
The problem is that retrievedText comes back with ? instead of unicode characters.
Here's a similar problem in python that I found: Trying to store Utf-8 data in datastore getting UnicodeEncodeError. Though my app is not getting any errors.
Unfortunately I think Java strings are default utf-8 and I can't find any code that will let me declare them explicitly as utf-8.
Edit: I've now built a small webapp that takes in unicode text and stores it in the datastore and then retrieves it with no problems. I still have no idea where the problem is in my original source code but I'm going to change the way my code handles webpage retrieval to match the smaller app that I just built. Thank you everyone for your help.