1
votes

I'm using feedparser to parse rss feeds such as https://www.relay.fm/analogue/feed and can't work out how explicitly identify the itunes:category values.

Looking at the feedparser itunes tests it appears that both the itunes:keywords and itunes:category values are put into the feed['tags'] dictionary.

From the tests for category:

<!--
Description: iTunes channel category
Expect:      not bozo and feed['tags'][0]['term'] == 'Technology'
-->
<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd">
    <channel>
        <itunes:category text="Technology"></itunes:category>
    </channel>
</rss>

and then keywords:

<!--
Description: iTunes channel keywords
Expect:      not bozo and feed['tags'][0]['term'] == 'Technology' and 
'itunes_keywords' not in feed
-->
<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd">
    <channel>
        <itunes:keywords>Technology</itunes:keywords>
    </channel>
</rss>

For the example feed above the entries are:

<itunes:keywords>Hurley, Liss, feelings</itunes:keywords>

and

<itunes:category text="Society &amp; Culture"/>
<itunes:category text="Technology"/>

resulting in the feed[tags] being populated as so:

[{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Hurley'},
 {'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Liss'},
 {'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'feelings'},
 {'label': None,'scheme': 'http://www.itunes.com/','term': 'Society & Culture'},
 {'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Technology'}]

Is there any way to uniquely identify the values that came from the itunes:category tag?

2

2 Answers

1
votes

I couldn't find a way to do this with just feedparser so I made use of beautifulsoup as well:

import bs4

soup = bs4.BeautifulSoup(raw_data, "lxml")        

def is_itunes_category(tag):
        return tag.name == 'itunes:category'

categories = [tag.attrs['text'] for tag in soup.find_all(is_itunes_category)]
0
votes

Feedparser v.6.0.2 implements specific itunes:x properties

  • itunes:category is available as category in feedparser
import feedparser
feedp = feedparser.parse(url)
category = feedp.feed.category 
  • itunes:keywords are indeed renamed tags in feedparser and populated into term

but channel keywords are mixed with item keywords to identify item keywords individually use scheme as filter

import feedparser
feedp = feedparser.parse(url)
#get all the keywords both item and channel
keywords = [k["term"] for k in feedp["feed"]["tags"]] 
# get the keywords from all the items 
keyword = [t["term"] for t in feedp["feed"]["tags"] if  t["scheme"] == 'http://www.itunes.com/']

This may erase the other tags if available but if itunes:keywords and tags they co-exists they are duplicates.

  • itunes:duration is available as itunes_duration
import feedparser
feedp = feedparser.parse(url)
duration = feedp["itunes_duration"] 

A bit out of subject but to complete the answer:

if multiple categories are available they are exposed in categories as tuples as mentioned in the documentation

>>>import feedparser
>>>feedp = feedparser.parse(url)
>>>categories = feedp.feed.categories 
>>>print(categories)
>>>[(u'Syndic8', u'1024'),
(u'dmoz', 'Top/Society/People/Personal_Homepages/P/')]

But itunes as no multiple categories...

There is not anymore needs to parse again with beautifulSoup4.