3
votes

I'm pretty noobish in perl, so maybe I did something stupid. My issue is that I'm currently using a script to get all the messages contained by an email box through the IMAP protocol, using Net::IMAP::Simple in PERL, but it does not gives me the entire body of the messages. My entire code looks like:

      use strict;
      use Net::IMAP::Simple;
      
       my $imap = Net::IMAP::Simple->new('xxxxxxxxxxxxxx') or die 'Impossible to connect '.$!;
      $imap->login('xxxxxxxx', 'xxxxxxxxx') or die 'Login Error!';
      my $nbmsg = $imap->select('INBOX') or die 'Impossible to reach this folder !';
      print 'You have '.$nbmsg." messages in this folder !\n\n";
      my $index = $imap->list() or die 'Impossible to list these messages !';

    foreach my $msgnum (keys %$index) {
    #if(!$imap->seen($msgnum)) {
          my $msg = $imap->get($msgnum) or die 'Impossible to retrieve this message'.$msgnum.' !';
          print $msg."\n\n";
    #   }
    }
    $imap->quit() or die 'quitting issue !';

And everytime that it is retrieving an email, it is giving me the first characters (which in my case are cryptics useless metadata generated by the bot that sending the messages), but not the entire body.

EDIT: Here is the body part displayed in the output:

Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: BASE64

Q2UgbWVzc2FnZSBhIMOpdMOpIGfDqW7DqXLDqSBhdXRvbWF0aXF1ZW1lbnQgcGFyIGwnaW1wcmlt YW50ZSBtdWx0aWZvbmN0aW9ucyBYZXJveCBYRVJPWF83ODMwLgogICBFbXBsYWNlbWVudCBkdSBz eXN0w6htZSA6IAogICBBZHJlc3NlIElQIHN5c3TDqG1lIDogMTkyLjE2OC4xLjIwMAogICBBZHJl c3NlIE1BQyBzeXN0w6htZSA6IDlDOjkzOjRFOjMzOjM1OkJECiAgIE1vZMOobGUgc3lzdMOobWUg OiBYZXJveCBXb3JrQ2VudHJlIDc4MzAKICAgTnVtw6lybyBkZSBzw6lyaWUgc3lzdMOobWUgOiAz OTEyNzk4ODk0CgpMJ0Fzc2lzdGFudCBkZSBjb21wdGV1ciBhIGVudm95w6kgbGUgcmVsZXbDqSBz dWl2YW50IGF1IHNlcnZldXIgZGUgY29tbXVuaWNhdGlvbiBYZXJveCBTTWFydCBlU29sdXRpb25z IGxlICAxNC8xMS8xNiAgIDA5OjI0OiAKICBUaXJhZ2VzIGVuIG5vaXIgOiAxMzIwNwogIFRpcmFn ZXMgZW4gY291bGV1ciA6IDkyNjg3CiAgVG91cyBsZXMgdGlyYWdlcyA6IDEwNTg5NA==

It is always ending by this "==" btw, which is making me think that the module is shortening the output.

I looked after some details about it in the CPAN documentation but sadly didn't find anything.

1
Please edit and include that output. I believe it's not actually part of the message, but the ref of the message object returned by $imap->get. Those objects do have a string overload to convert them into text, but you're not using it. You'd have to do print "$msg" to force it into a string. - simbabque
I did it already. But it didn't change anything. I'm editing to add the output. - Amram Elbaz
Looking at the doc I think you don't need to get the list(), you can also just basically iterate from 1 to $nbmsg in a loop. The number is not an ID or anything, it's just an index. So foreach my $msgnum ( 1 .. $nbmsg ) { ... } would work. That would also sort them directly. What you have now returns the keys of %$index in random order because hashes in Perl are not sorted, and the implementation details make the order random. Don't rely no the order of what's coming out there, it would be different on a different machine. - simbabque
edited, thanks for your help. :) - Amram Elbaz
FWIW, a valid base64 chunk is 4x characters long excluding whitespace and a few other characters, and it's padded with = characters to make that true. In this case the original content is 3a+1 bytes, which encodes to 4b+2, so there has to be two padding bytes at the end. - arnt

1 Answers

2
votes

Your messages are encoded as Base64. It's perfectly normal for emails to have that MIME type, though not required. You need to decode them. A good way to do that is to use MIME::Base64. Note that the == is part of the Base64 string. It's a padding to make the string have the right length.

use strict;
use warnings;
use MIME::Base64 'decode_base64';

my $decoded_msg = decode_base64($msg_body);

However, you need to get the body out of those message objects. The documentation is vague about that, it doesn't say what those objects are, and get only returns the raw message.

I suggest you install Data::Printer and use it to dump one of your $msg objects. That dump will include the internals of the object (which is likely a hash reference), and all the methods the object has. It's possible this object includes an accessor to get the already decoded content. If that's the case, you don't need to decode yourself. If not, grab the body out and decode it with decode_base64.


Update: I read the code, and it creates Net::IMAP::Simple::_message objects in the get method. There is a package definition at the top of the code. It's a bit complex, but it's obvious. It uses the arrayref of lines as the data structure behind the object, so I was wrong above.

q( package Net::IMAP::Simple::_message;
   use overload fallback=>1, '""' => sub { local $"=""; "@{$_[0]}" };
   sub new { bless $_[1] })

And further down:

return  wantarray ? @lines : Net::IMAP::Simple::_message->new(\@lines)

So to get the body, you need to get rid of the header string. Once you've dumped out the object, you should see how many elements at the beginning of the array are the header and the empty line. I assume index 0 is the header line, and index 1 is the empty line. If you don't care about those, you can just throw them away.

This will change the object.

shift @$msg; # get rid of header
shift @$msg; # get rid of empty line

my $decoded_msg = decode_base64("$msg");