0
votes

I am trying to access a link in my emails that I get in my Gmail account from a specific email address. So far using GMAIL api I am able to get the email id using python function ListMessagesMatchingQuery from the documentation: https://developers.google.com/gmail/api/v1/reference/users/messages/list

Then from here, I am able to retrieve the contents of the email using python function GetMessage from the documentation: https://developers.google.com/gmail/api/v1/reference/users/messages/get.

The format of the contents, however, is not enough. What I want to get is the link from the email contents so that I can access it's HTML page and then scrape it.

Thank you

1
Have you tried to get that specific string using Users.messages:get then convert that string to url format so that you can access the link?jess

1 Answers

0
votes

After fetching the email contents from Gmail, you can use Python's email parser library to parse out the MIME section that's HTML. See: https://docs.python.org/3.7/library/email.parser.html

With the HTML in hand, you can then use Beautifulsoup to parse for anything you want, see: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

From the documentation page above, "One common task is extracting all the URLs found within a page’s tags", and here's the code fragment:

for link in soup.find_all('a'):
    print(link.get('href'))  

If the email parser library tells you the email has no HTML component, you'll have to look through the text looking for links (e.g. look for "http://" or "https/").