0
votes

I want to get just email text body from Gmail(imap servers in general) without need to download the entire message.

if I fetch for RC822, I can get everything just fine:

mail_box.fetch(message_ids, '(RFC822)')

But the problem is if I have too many messages and with attachments, it takes a lot of time.

I could get just the headers and text body I need with:

mail_box.fetch(message_ids, '(RFC822.HEADER BODY.PEEK[1])')

But this way I couldn't parse the text body, it has a weird format:

'\r\n------=_NextPart_001_0011_01CB63DF.D39BA1C0\r\nContent-Type: text/plain;\r\n\tcharset="iso-8859-1"\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\nRafael, ...other content like html tags and css...------=_NextPart_001_0011_01CB63DF.D39BA1C0--\r\n'

Tried to parse it with email.message_from_string and quopri modules, but no luck so far.

Is it possible? To get messages formatted like RFC822 but without downloading attachments?

1
------=_NextPart_001_0011_01CB63DF.D39BA1C0 hints on the message being a multipart message with a header field Content-Type: multipart/....; boundary=----=_NextPart.... Maybe you have to pass the Content-Type header to the function that decodes the body so it knows what to do with the NextPart-thing. some additional info: en.wikipedia.org/wiki/MIME#Multipart_messagesUser
Either fetch and parse the BODYSTRUCTURE, and you can get just the part you need, or fetch the MIME part headers to go with 1. MIME messages are quite complex, and there's no way in IMAP to say "Just give the body" (as which one? HTML? Text? RTF? PDF?), so you can either guess, or you download the BODYSTRUCTURE and identify which part you actually want and fetch it, like 1.1 or 1.1.1....Max
I did it, I fetch first BODYSTRUCTURE and I found text plain message as 1(that's why I am fetching BODY.PEEK[1]). But fetching this, it comes with this weird header on the text string. I am parsing manually with slices and substring, just wondering if I have a better way.rafanunes

1 Answers

0
votes

The correct way is to request the BODYSTRUCTURE of the message and only fetch the relevant part.

In the comments you are suggesting that you have already fetched the BODYSTRUCTURE and that the part 1 corresponds to a text/plain MIME part. Please show us the whole, unprocessed BODYSTRUCTURE; without it, one cannot tell whether the IMAP server you are using is buggy or whether your understanding of the BODYSTRUCTURE is wrong.