2
votes

I want scrap chat messages in youtube live chat. At first, I just followed a way in "https://www.youtube.com/watch?v=W2DS6wT6_48"

But the code does not work.

The error message is

all_comments = driver.find_element_by_id("all-comments")

...

selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with id 'all-comments'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"93","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:12695", "User-Agent":"Python-urllib/2.7"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"id\", \"sessionId\": \"e4b63b00-fe9c-11e6-a630-0fa086b5cd8d\", \"value\": \"all-comments\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"", "host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/e4b63b00-fe9c-11e6-a630-0fa086b5cd8d/element"}}

What I understand is that there is no element which name is "all-comments" So, find_element_by_id has failed.

And then, I tried some id or xpath to catch chat message

enter image description here

But nothing can get chat message.

Is there something what I do wrong way?

What I do to scrap chat message?

2
I guess the chat part is loaded via Javascript. If you download the html of that page, without processing it via browser, do this ids show up? Or just disable Javascript in the brower and have a look then - Martin Krung
In the youtube video, the python code does nothing with javascript but it works at least in the video. I think that the structure of youtube page could be changed. So, I tried several id, xpath and class names. - Py K
Currently, what i found is the chat messages exist in <iframe id='live-chat-iframe' ...> ... <yt-live-chat-app> ... <div id='contents' ... > ... <span id='message' ... > chat message </span> I can find live-chat-iframe with find_element_by_id('live-chat-iframe') but find_element_by_id has failed with id 'contens' and 'message' - Py K
@FabianThommen I use browser PhantomJS to get the page. I followed [stackoverflow.com/questions/32115673/… to disable javascript. And then webdriver cannot find any element with id 'all-comment', 'comments', 'message', or 'live-chat-iframe'. - Py K
Without disabling javascript, I can get an element with id 'live-chat-iframe'. But I cannot find any element with id coments and message. Is there any way to list all elements under the element live-chat-iframe without id or class? - Py K

2 Answers

1
votes

you will never have access to the content of the iframe. this is by design. an iframe its like a browser in the browser.

see here https://en.wikipedia.org/wiki/Same-origin_policy

you have to read out src attribut from iframe, load this page and then filter it. this will work

-2
votes

I know this thread is a bit old but I am able to scrape youtube live chat feeds using casperjs. It's a work in progress but you can get the gist here

https://github.com/archae0pteryx/yt-live-chat-scraper