4
votes

I have an object.

    fp = open(self.currentEmailPath, "rb")
    p = email.Parser.Parser()
    self._currentEmailParsedInstance= p.parse(fp)
    fp.close()

self.currentEmailParsedInstance, from this object I want to get the body of an email, text only no HTML....

How do I do it?


something like this?

        newmsg=self._currentEmailParsedInstance.get_payload()
        body=newmsg[0].get_content....?

then strip the html from body. just what is that .... method to return the actual text... maybe I mis-understand you

        msg=self._currentEmailParsedInstance.get_payload()
        print type(msg)

output = type 'list'


the email

Return-Path:
Received: from xx.xx.net (example) by mxx3.xx.net (xxx)
id 485EF65F08EDX5E12 for [email protected]; Thu, 23 Oct 2008 06:07:51 +0200
Received: from xxxxx2 (ccc) by example.net (ccc) (authenticated as [email protected]) id 48798D4001146189 for [email protected]; Thu, 23 Oct 2008 06:07:51 +0200
From: "example"
To:
Subject: FW: example Date: Thu, 23 Oct 2008 12:07:45 +0800
Organization: example Message-ID: <001601c934c4$xxxx30$a9ff460a@xxx>
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_NextPart_000_0017_01C93507.F6F64E30"
X-Mailer: Microsoft Office Outlook 11
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
Thread-Index: Ack0wLaumqgZo1oXSBuIpUCEg/wfOAABAFEA

This is a multi-part message in MIME format.

------=_NextPart_000_0017_01C93507.F6F64E30
Content-Type: multipart/alternative;
boundary="----=_NextPart_001_0018_01C93507.F6F64E30"

------=_NextPart_001_0018_01C93507.F6F64E30
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit

From: example.example[mailto:[email protected]]
Sent: Thursday, October 23, 2008 11:37 AM
To: [email protected]
Subject: S/I for example(B/L
No.:4357-0120-810.044)

Please find attached the example.doc),

Thanks.

B.rgds,

xxx xxx

------=_NextPart_001_0018_01C93507.F6F64E30
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:st1=3D"urn:schemas-microsoft-com:office:smarttags" =
xmlns=3D"http://www.w3.org/TR/REC-html40">

HTML STUFF till

------=_NextPart_001_0018_01C93507.F6F64E30--

------=_NextPart_000_0017_01C93507.F6F64E30
Content-Type: application/msword;
name="xxxx.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="xxxx.doc"

0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAAYAAAAAAAAAAA EAAAYgAAAAEAAAD+////AAAAAF8AAAD///////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// ///////////////////////////////////////////////////////////////////////////s pcEAI2AJBAAA+FK/AAAAAAAAEAAAAAAABgAAnEIAAA4AYmpiaqEVoRUAAAAAAAAAAAAAAAAAAAAA AAAECBYAMlAAAMN/AADDfwAAQQ4AAAAAAAAPAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//w8AAAAA AAAAAAD//w8AAAAAAAAAAAD//w8AAAAAAAAAAAAAAAAAAAAAAKQAAAAAAEYEAAAAAAAARgQAAEYE AAAAAAAARgQAAAAAAABGBAAAAAAAAEYEAAAAAAAARgQAABQAAAAAAAAAAAAAAFoEAAAAAAAA4hsA AAAAAADiGwAAAAAAAOIbAAA4AAAAGhwAAHwAAACWHAAARAAAAFoEAAAAAAAABzcAAEgBAADmHAAA FgAAAPwcAAAAAAAA/BwAAAAAAAD8HAAAAAAAAPwcAAAAAAAA/BwAAAAAAAD8HAAAAAAAAPwcAAAA AAAAMjYAAAIAAAA0NgAAAAAAADQ2AAAAAAAANDYAAAAAAAA0NgAAAAAAADQ2AAAAAAAANDYAACQA AABPOAAAaAIAALc6AACOAAAAWDYAAGkAAAAAAAAAAAAAAAAAAAAAAAAARgQAAAAAAABHLAAAAAAA AAAAAAAAAAAAAAAAAAAAAAD8HAAAAAAAAPwcAAAAAAAARywAAAAAAABHLAAAAAAAAFg2AAAAAAAA

------=_NextPart_000_0017_01C93507.F6F64E30--


I just want to get :

From: xxxx.xxxx [mailto:[email protected]]
Sent: Thursday, October 23, 2008 11:37 AM
To: [email protected]
Subject: S/I for xxxxx (B/L
No.:4357-0120-810.044)

Pls find attached the xxxx.doc),

Thanks.

B.rgds,

xxx xxx


not sure if the mail is malformed! seems if you get an html page you have to do this:

        parts=self._currentEmailParsedInstance.get_payload()
        print parts[0].get_content_type()
        ..._multipart/alternative_
        textParts=parts[0].get_payload()
        print textParts[0].get_content_type()
        ..._text/plain_
        body=textParts[0].get_payload()
        print body
        ...get the text without a problem!!

thank you so much Vinko.

So its kinda like dealing with xml, recursive in nature.

2
get_payload() will return the body of the message. If your messages are multipart you have to provide an index, as I said in the answer. Please be more explicitVinko Vrsalovic♦
so this is the output for msg=self._currentEmailParsedInstance.get_payload(0); print msg; "ImportError: no module named locale"Setori
that is what i understand about when you say give it an index. I am sure i misunderstand you. Or I understand it like this newmsg=self._currentEmailParsedInstance.get_payload(); body=newmsg[0].get_content....?; Im trying to be as explicit as i can.Setori
That ImportError is the key, jython is not exactly the same as python, sadly, try to import java.util.Locale;Vinko Vrsalovic♦

2 Answers

4
votes

This will get you the contents of the message

self.currentEmailParsedInstance.get_payload()

As for the text only part you will have to strip HTML on your own, for example using BeautifulSoup.

Check this link for more information about the Message class the Parser returns. If you mean getting the text part of messages containing both HTML and plain text version of themselves, you can specify an index to get_payload() to get the part you want.

I tried with a different MIME email because what you pasted seems malformed, hopefully it got malformed when you edited it.

>>> parser = email.parser.Parser()
>>> message = parser.parse(open('/home/vinko/jlm.txt','r'))
>>> message.is_multipart()
True
>>> parts = message.get_payload()
>>> len(parts)
2
>>> parts[0].get_content_type()
'text/plain'
>>> parts[1].get_content_type()
'message/rfc822'
>>> parts[0].get_payload()
'Message Text'

parts will contain all parts of the multipart message, you can check their content types as shown and get only the text/plain ones, for instance.

Good luck.

0
votes

ended up with this

        parser = email.parser.Parser()
        self._email = parser.parse(open('/home/vinko/jlm.txt','r'))
        parts=self._email.get_payload()
        check=parts[0].get_content_type()
        if check == "text/plain":
            return parts[0].get_payload()
        elif check == "multipart/alternative":
            part=parts[0].get_payload()
            if part[0].get_content_type() == "text/plain":
                return part[0].get_payload()
            else:
                return "cannot obtain the body of the email"
        else:
            return "cannot obtain the body of the email"