I'm writing an RSS reader in python as a learning exercise, and I would really like to be able to tag individual entries with keywords for searching. Unfortunately, most real-world feeds don't include keyword metadata. I currently have about 60,000 entries in my test database from about 600 feeds, so manually tagging is not going to be effective. So far I have only been able to find two solutions:
1: Use Natural Language Toolkit to extract keywords:
- Pros: flexible; no dependencies on external services;
- Cons: can only index the article summary, not the article; non-trivial: writing a high quality keyword extraction tool is a project in itself;
2: Use the Google Adwords API to fetch keyword suggestions from the article url:
- Pros: Super high quality keywords; based on entire article text; easy to use;
- Cons: Not free(?); Query rate limits unknown; I'm terrified of getting my account banned and not being able to run adwords campaigns for my commercial sites;
Can anyone offer any suggestions? Are my fears about getting my adwords account banned unfounded?