3
votes

I'm reading the messages from an email account by using JavaMail 1.4.1 (I've upgraded to 1.4.5 version but with the same problem), but I'm having issues with the encoding of the content:

POP3Message pop3message;
... 
Object contentObject = pop3message.getContent();
...   
String contentType = pop3message.getContentType();
String content = contentObject.toString();

Some messages are read properly, but others have strange characters because of a not suitable encoding. I have realized it doesn't work for a specific content type.

It works well if the contentType is any of these:

  • text/plain; charset=ISO-8859-1

  • text/plain;
    charset="iso-8859-1"

  • text/plain;
    charset="ISO-8859-1";
    format="flowed"

  • text/plain; charset=windows-1252

but it doesn't if it is:

  • text/plain;
    charset="utf-8"

for this contentType (UTF-8 one) if I try to get the encoding (pop3message.getEncoding()) I get

quoted-printable

For the latter encoding I get for example in the debugger in the String value (in the same way as I see it in the database after persisting the object):

Ubicación (instead of Ubicación)

But if I open the email with the email client in a browser it can be read without any problem, and it's a normal message (no attachments, just text), so the message seems to be OK.

Any idea about how to solve this issue?

Thanks.


UPDATE This is the piece of code I've added to try the function getUTF8Content() given by jlordo

POP3Message pop3message = (POP3Message) message;
String uid = pop3folder.getUID(message);

//START JUST FOR TESTING PURPOSES
if(uid.trim().equals("1401")){
    Object utfContent = pop3message.getContent();
    System.out.println(utfContent.getClass().getName()); // it is of type String
    //System.out.println(utfContent); // if not commmented it prints the content of one of the emails I'm having problems with.
    System.out.println(pop3message.getEncoding()); //prints: quoted-printable
    System.out.println(pop3message.getContentType()); //prints: text/plain; charset="utf-8"
    String utfContentString = getUTF8Content(utfContent); // throws java.lang.ClassCastException: java.lang.String cannot be cast to javax.mail.util.SharedByteArrayInputStream
    System.out.println(utfContentString);
}

//END TEST CODE
4
Where exactly do you see Ubicación (instead of Ubicación)? Console? Variable Inspector? I suspect everything is fine, but the debugger can't display utf-8 characters.jlordo
@jlordo In the debugger of Eclipse I see that by watching what is inside the content variable. Also in the database, postgresql, if I do a select I get that result.Javi
Do you read it from the db, or write it to the db and then read it out again? Is the db set up correctly?jlordo
@jlordo How can it be a problem of the database if I detect the problem even before the data is persisted?Javi
@jlordo before persisting data I watch it in the debugger, I save it to a log, I print it even in the console and all of them are in the same way (while with ISO-8859-1 and windows-1252 it is shown correctly). After persisting it in the database I can see exactly the same by using the admin of PostgreSQL. Do you really think Eclipse, the console, the logs and later the PostgreSQL admin are not able to print it correctly? I think it must be a problem regarding Javamail.Javi

4 Answers

1
votes

How are you detecting that these messages have "strange characters"? Are you displaying the data somewhere? It's possible that whatever method you're using to display the data isn't handling Unicode characters properly.

The first step is to determine whether the problem is that you're getting the wrong characters, or that the correct characters are being displayed incorrectly. You can examine the Unicode values of each character in the data (e.g., in the String returned from the getContent method) to make sure each character has the correct Unicode value. If it does, the problem is with the method you're using to display the characters.

0
votes

try this and let me know if it works:

if ( *check if utf 8 here* ) {
    content = getUTF8Content(contentObject);
}

// TODO take care of UnsupportedEncodingException, 
// IOException and ClassCastException
public static String getUTF8Content(Object contentObject) {
    // possible ClassCastException
    SharedByteArrayInputStream sbais = (SharedByteArrayInputStream) contentObject;
    // possible UnsupportedEncodingException
    InputStreamReader isr = new InputStreamReader(sbais, Charset.forName("UTF-8"));
    int charsRead = 0;
    StringBuilder content = new StringBuilder();
    int bufferSize = 1024;
    char[] buffer = new char[bufferSize];
    // possible IOException
    while ((charsRead = isr.read(buffer)) != -1) {
        content.append(Arrays.copyOf(buffer, charsRead));
    }
    return content.toString();
}

BTW, is JavaMail 1.4.1 a requirement? Up to date version is 1.4.5.

0
votes

What worked for me was that I called getContentType() and I would check if the String contains a "utf" in it (defining the charset used as one of UTF).

If yes, I would treat the content differently in this case.

private String encodeCorrectly(InputStream is) {
    java.util.Scanner s = new java.util.Scanner(is, StandardCharsets.UTF_8.toString()).useDelimiter("\\A");
    return s.hasNext() ? s.next() : "";
}

(a modification of a IS to String converter from this answer on SO)

The important part here is using the correct Charset. This solved the issue for me.

0
votes

First of all you must add headers according to UTF-8 encoding this way:

...
MimeMessage msg = new MimeMessage(session);
msg.setHeader("Content-Type", "text/html; charset=UTF-8");
msg.setHeader("Content-Transfer-Encoding", "8bit");

msg.setFrom(new InternetAddress(doConversion(from)));
msg.setRecipients(javax.mail.Message.RecipientType.TO, address);
msg.setSubject(asunto, "UTF-8");

MimeBodyPart mbp1 = new MimeBodyPart();
mbp1.setContent(text, "text/html; charset=UTF-8");
Multipart mp = new MimeMultipart();
mp.addBodyPart(mbp1);
...

But for 'from' header, i use the following method to convert characters:

public String doConversion(String original) {
    if(original == null) return null;
    String converted = original.replaceAll("á", "\u00c3\u00a1");
    converted = converted.replaceAll("Á", "\u00c3\u0081");
    converted = converted.replaceAll("é", "\u00c3\u00a9");
    converted = converted.replaceAll("É", "\u00c3\u0089");
    converted = converted.replaceAll("í", "\u00c3\u00ad");
    converted = converted.replaceAll("Í", "\u00c3\u008d");
    converted = converted.replaceAll("ó", "\u00c3\u00b3");
    converted = converted.replaceAll("Ó", "\u00c3\u0093");
    converted = converted.replaceAll("ú", "\u00c3\u00ba");
    converted = converted.replaceAll("Ú", "\u00c3\u009a");
    converted = converted.replaceAll("ñ", "\u00c3\u00b1");
    converted = converted.replaceAll("Ñ", "\u00c3\u0091");
    converted = converted.replaceAll("€", "\u00c2\u0080");
    converted = converted.replaceAll("¿", "\u00c2\u00bf");
    converted = converted.replaceAll("ª", "\u00c2\u00aa");
    converted = converted.replaceAll("º", "\u00c2\u00b0");
    return converted;
}

You can see the corresponding UTF-8 hex encoding in UTF at http://www.fileformat.info/info/charset/UTF-8/list.htm if you need to include some other characters.