I'm pretty much brand new to web scraping with rvest.. and really new to most everything except Qlik coding.
I am attempting to scrape data found at board game geek, see the below link. Using inspect, it certainly seems possible, but yet rvest is not finding the tags. I first thought I had to go through the whole javascript process using V8 (javascript is called at the top of the html), but when I just use html_text on the whole document, all the information I need is in there.
*UPDATE: It appears to be in JSON. I used a combination of notepad++ and web tool to clean it and load into R. Any recommendations on tutorials/demos for how to do this systematically? I have all the links I need to loop through, but not sure how to go from the html_text output to a clean JSON input via code. *
I provided examples below, but I need to scrape the majority of the data elements available, so not looking for code to copy and paste but rather the best method to pursue. See below.
Link: https://boardgamegeek.com/boardgame/63888/innovation
HTML Example I am trying to pull from. Span returns nothing with html_nodes so I couldn't even start there.
<span ng-if="min > 0" class="ng-binding ng-scope">45</span>
OR
<a title="Civilization" ng-href="/boardgamecategory/1015/civilization" class="ng-binding" href="/boardgamecategory/1015/civilization">Civilization</a>
Javscript sections at top of page like this: about 8 of them:
<script type="text/javascript" src="https://cf.geekdo-static.com/static/geekcollection_master2_5e84926ab7e90.js"></script>
When I just use html_text on the whole object I can find see all the elements I am looking for e.g.:
\"minplaytime\":\"30\" OR {\"name\":\"Deck, Bag, and Pool Building\"
I'm assuming this is JSON? Is there a way to parse the html_text output, or another method? Is it easier just to rush the javascript at the top of the page using V8? Is there an easy guide for this?