Having issues scraping Twitter pages using importXML in Google Sheets.
The below was working fine last week, but now responds with the error "Imported XML content cannot be parsed."
and the xpath is "//span[@class='username js-action-profile-name']"
Having issues scraping Twitter pages using importXML in Google Sheets.
The below was working fine last week, but now responds with the error "Imported XML content cannot be parsed."
and the xpath is "//span[@class='username js-action-profile-name']"
The message is correct, the data at that URL is not valid XML. For instance, the line:
<noscript><meta http-equiv="refresh" content="0; URL=https://mobile.twitter.com/i/nojs_router?path=%2Fsearch&q=anyone%20recommend%20restaurant%20london%20since%3A2015-03-16%20until%3A2015-09-16"></noscript>
Is not valid, the meta element is not closed. Likewise the script element contains a lot of reserved, unescaped characters.
Unless you use some kind of tool that turns HTML into a DOM tree, there is not much you can do given that document. Except perhaps using a tool like Selenium that can get the DOM tree a browser generates.
Since you are scraping Twitter, you can probably better and easier use the Twitter REST API. Much easier and robust.