BeautifulSoup get only the "general" text in a td tag, and nothing in nested tags

Question

Say that my html looks like this:

<td>Potato1 <span somestuff...>Potato2</span></td>
...
<td>Potato9 <span somestuff...>Potato10</span></td>

I have beautifulsoup doing this:

for tag in soup.find_all("td"):
    print tag.text

And I get

Potato1 Potato2
....
Potato9 Potato10

Would it be possible to just get the text that's inside the tag but not any text nested inside the span tag?

nu11p01n73R nu11p01n73R · Accepted Answer · 2015-07-07T17:16:08

You can use .contents as

>>> for tag in soup.find_all("td"):
...     print tag.contents[0]
...
Potato1
Potato9

What it does?

A tags children are available as a list using the .contents.

>>> for tag in soup.find_all("td"):
...     print tag.contents
...
[u'Potato1 ', <span somestuff...="">Potato2</span>]
[u'Potato9 ', <span somestuff...="">Potato10</span>]

since we are only interested in the first element, we go for

print tag.contents[0]

BeautifulSoup get only the "general" text in a td tag, and nothing in nested tags

2 Answers