I am currently making a POC script for a news site webscraper. I am new to scraping but have basic familiarity with css tags and xpaths after completing an API usage course on Datacamp. I went to the Bloomberg Europe homepage (I know they have an API, I just wanted a larger news website to test the code on) armed with SelectorGadget and Google Chrome's "select an element in the page to inspect it" functions, copied what I thought were the relevant CSS tags and/or xpaths, and promptly received an empty list when I fed any of them to rvest::html_nodes().
The code I was using is here:
library(rvest)
url <- "https://www.bloomberg.com/europe"
webpage <- read_html(url)
xpath_id='//*[contains(concat( " ", @class, " " ), concat( " ", "story-package-module__story__headline-link", " " ))]'
titles_html_xpath <- html_nodes(webpage, xpath = xpath_id)
# xpath returns empty list, try css
titles_html_selectorgadget <- html_nodes(webpage, css =".story-package-module__story__headline")
# also empty, try alternative class tag
titles_html_selectorgadget2 <- html_nodes(webpage, css =".story-package-module__story mod-story")
# still empty!
Any advice as to what the correct tag is (to get article titles in this case) and more importantly how I should go about working out which CSS tag I need in future cases, especially when there are so many css classes layered on top of each other and the tag recommended by SelectorGadget is incorrect?