The problem is the package tm.plugin.webmining
is out of date.
Only the YahooFinanceSource
and YahooNewsSource
are alive at the time of this reply.
Here is a quick reference and test.
From the Vignette page written by the author, there should be 8 possible source sites:
- GoogleBlogSearchSource
- GoogleFinaceSource
- GoogleNewsSource
- NYTimesSource
- ReutersNewsSource
- YahooFinanceSource
- YahooInplaySource
- YahooNewsSource
But according to the Github page, the first one "GoogleBlogSearchSource" has already been proven to be discontinued. For the 7 sources remained, I did a simple test to see if they work:
library(tm)
library(tm.plugin.webmining)
googlefinance <- WebCorpus(GoogleFinanceSource("A"))
googlenews <- WebCorpus(GoogleNewsSource("A"))
nytimes <- WebCorpus(NYTimesSource("A", appid = nytimes_appid))
reutersnews <- WebCorpus(ReutersNewsSource("A"))
yahoofinance <- WebCorpus(YahooFinanceSource("A"))
yahooinplay <- WebCorpus(YahooInplaySource())
yahoonews <- WebCorpus(YahooNewsSource("M"))
The result shows that all the yahoo's sourses are technically still running, but the YahooInplaySource
returns 0 documents no matter what parameter I chose.
> googlefinance <- WebCorpus(GoogleFinanceSource("NASDAQ:MSFT"))
StartTag: invalid element name
Extra content at the end of the document
Error in inherits(x, "WebSource") : 1: StartTag: invalid element name
2: Extra content at the end of the document
> googlefinance <- WebCorpus(GoogleFinanceSource("A"))
StartTag: invalid element name
Extra content at the end of the document
Error in inherits(x, "WebSource") : 1: StartTag: invalid element name
2: Extra content at the end of the document
> googlenews <- WebCorpus(GoogleNewsSource("A"))
Unknown IO errorfailed to load external entity "http://news.google.com/news?hl=en&q=A&ie=utf-8&num=100&output=rss"
Error in inherits(x, "WebSource") :
1: Unknown IO error2: failed to load external entity "http://news.google.com/news?hl=en&q=A&ie=utf-8&num=100&output=rss"
> nytimes <- WebCorpus(NYTimesSource("A", appid = nytimes_appid))
Error in inherits(x, "WebSource") : object 'nytimes_appid' not found
> reutersnews <- WebCorpus(ReutersNewsSource("A"))
Entity 'ldquo' not defined
Entity 'rdquo' not defined
Opening and ending tag mismatch: div line 60 and body
Opening and ending tag mismatch: body line 59 and html
Premature end of data in tag html line 1
Error in inherits(x, "WebSource") : 1: Entity 'ldquo' not defined
2: Entity 'rdquo' not defined
3: Opening and ending tag mismatch: div line 60 and body
4: Opening and ending tag mismatch: body line 59 and html
5: Premature end of data in tag html line 1
> yahoofinance <- WebCorpus(YahooFinanceSource("A"))
> yahoofinance
<<WebCorpus>>
Metadata: corpus specific: 3, document level (indexed): 0
Content: documents: 16
> yahooinplay <- WebCorpus(YahooInplaySource())
> yahooinplay
<<WebCorpus>>
Metadata: corpus specific: 3, document level (indexed): 0
Content: documents: 0
> yahoonews <- WebCorpus(YahooNewsSource("A"))
> yahoonews
<<WebCorpus>>
Metadata: corpus specific: 3, document level (indexed): 0
Content: documents: 0
> yahoonews <- WebCorpus(YahooNewsSource("M"))
> yahoonews
<<WebCorpus>>
Metadata: corpus specific: 3, document level (indexed): 0
Content: documents: 10
Also it worth to be mentioned that even though YahooFinanceSourse
is working, it won't return the similar content as GoogleFinanceSource
was supposed to do. If you want to play with the examples in , I think you may use YahooNewsSource
with a customized list of queries.