9
votes

I am having serious problems decoding the message body of the emails I get using the Gmail API. I want to grab the message content and put the content in a div. I am using a base64 decoder, which I know won't decode emails encoded differently, but I am not sure how to check an email to decide which decoder to use -- emails that say they are utf-8 encoded are successfully decoded by the base64 decoder, but not be a utf-8 decoder.

I've been researching email decoding for several days now, and I've learned that I am a little out of my league here. I haven't done much work with coding around email before. Here is the code I am using to get the emails:

gapi.client.load('gmail', 'v1', function() {
var request = gapi.client.gmail.users.messages.list({
  labelIds: ['INBOX']
});
request.execute(function(resp) {
  document.getElementById('email-announcement').innerHTML = '<i>Hello! I am reading your <b>inbox</b> emails.</i><br><br>------<br>';
  var content = document.getElementById("message-list");
  if (resp.messages == null) {
    content.innerHTML = "<b>Your inbox is empty.</b>";
  } else {
    var encodings = 0;
    content.innerHTML = "";
    angular.forEach(resp.messages, function(message) {
      var email = gapi.client.gmail.users.messages.get({
      'id': message.id
      });
      email.execute(function(stuff) {
        if (stuff.payload == null) {
          console.log("Payload null: " + message.id);
        }
        var header = "";
        var sender = "";
        angular.forEach(stuff.payload.headers, function(item) {
          if (item.name == "Subject") {
            header = item.value;
          }
          if (item.name == "From") {
            sender = item.value;
          }
        })
        try {
          var contents = "";
          if (stuff.payload.parts == null) {
            contents = base64.decode(stuff.payload.body.data);
          } else {
            contents = base64.decode(stuff.payload.parts[0].body.data);
          }
          content.innerHTML += '<b>Subject: ' + header + '</b><br>';
          content.innerHTML += '<b>From: ' + sender + '</b><br>';
          content.innerHTML += contents + "<br><br>";
        } catch (err) {
          console.log("Encoding error: " + encodings++);
        }
      })
    })
  }
 });
});

I was performing some checks and debugging, so there's leftover console.log's and some other things that are only there for testing. Still, you can see here what I am trying to do.

What is the best way to decode the emails I pull from the Gmail API? Should I try to put the emails into <script>'s with charset and type attributes matching the encoding content of the email? I believe I remember charset only works with a src attribute, which I wouldn't have here. Any suggestions?

7

7 Answers

22
votes

For a prototype app I'm writing, the following code is working for me:

var base64 = require('js-base64').Base64;
// js-base64 is working fine for me.

var bodyData = message.payload.body.data;
// Simplified code: you'd need to check for multipart.

base64.decode(bodyData.replace(/-/g, '+').replace(/_/g, '/'));
// If you're going to use a different library other than js-base64,
// you may need to replace some characters before passing it to the decoder.

Caution: these points are not explicitly documented and could be wrong:

  1. The users.messages: get API returns "parsed body content" by default. This data seems to be always encoded in UTF-8 and Base64, regardless of the Content-Type and Content-Transfer-Encoding header.

    For example, my code had no problem parsing an email with these headers: Content-Type: text/plain; charset=ISO-2022-JP, Content-Transfer-Encoding: 7bit.

  2. The mapping table of the Base64 encoding varies among various implementations. Gmail API uses - and _ as the last two characters of the table, as defined by RFC 4648's "URL and Filename safe Alphabet"1.

    Check if your Base64 library is using a different mapping table. If so, replace those characters with the ones your library accepts before passing the body to the decoder.


1 There is one supportive line in the documentation: the "raw" format returns "body content as a base64url encoded string". (Thanks Eric!)

3
votes

Use atob to decode the messages in JavaScript (see ref). For accessing your message payload, you can write a function:

var extractField = function(json, fieldName) {
  return json.payload.headers.filter(function(header) {
    return header.name === fieldName;
  })[0].value;
};
var date = extractField(response, "Date");
var subject = extractField(response, "Subject");

referenced from my previous SO Question and

var part = message.parts.filter(function(part) {
  return part.mimeType == 'text/html';
});
var html = atob(part.body.data);

If the above does not decode 100% properly, the comments by @cgenco on this answer below may apply to you. In that case, do

var html = atob(part.body.data.replace(/-/g, '+').replace(/_/g, '/'));
2
votes

Here is the solution: Gmail API - "Users.messages: get" method has in response message.payload.body.data parted base64 data, it's separated by "-" symbol. It's not entire base64 encoded text, it's parts of base64 text. You have to try to decode every single part of this or make one mono string by unite and replace "-" symbol. After this you can easily decode it to human text. You can manually check every part here https://www.base64decode.org

2
votes

I was also annoyed by this point. I discovered a solution through looking at an extension for VSCode. The solution is really simple:

const body = response.data.payload.body; // the base64 encoded body of a message
 body = Buffer.alloc(
        body.data.length,
        body.data,
        "base64"
      ).toString();  // the decoded message

It worked for me as I was using gmail.users.messages.get() call of Gmail API.

1
votes

Please use websafe decoder for decoding gmail emails and attachments. I got blank pages when I used just base64decoder, had to use this: https://www.npmjs.com/package/urlsafe-base64

0
votes

I can easily decode using another tool at https://simplycalc.com/base64-decode.php

In JS: https://www.npmjs.com/package/base64url

In Python 3:

import base64
base64.urlsafe_b64decode(coded_string)
0
votes

Thank @ento 's answer. I explain more why you need to replace '-' and '_' character to '+' and '/' before decode.

Wiki Base64 Variants summary table shows:

  • RFC 4648 section 4: base64 (standard): use '+' and '/'
  • RFC 4648 section 5: base64url (URL-safe and filename-safe standard): use '-' and '_'

In short, Gmail API use base64url (urlsafe) format('-' and '_'), But JavaScript atob function or other JavaScript libraries use base64 (standard) format('+' and '/').

For Gmail API, the document says body use base64url format, see below links:

For Web atob/btoa standards, see below links: