0
votes

I'm parsing some Internet-shop pages with the list of <div class="classname" ...> tgs. By 24 on each page, for example. But some elements loaded in time, some no. WebDriver(Chrome) find takes 4-6 elements normally loaded like this:

<div class="classname">
   <div class="abcd">...</div>
</div>

and 18-20 like <div class="classname" ...><!-- --></div> - not loaded

So i use driver.find_elements_by_class("abcd"), and it get only 4-6 elements.

How to wait total list loaded using the WebDriverWait.until or implicity_wait? (Where is no any another elements, that can to be waited. All other parts of the page loads fully and correctly.) Or how to simply delay some seconds without conditions and get finish version of page in WebDriver object? (driver.iImplicity_wait(10)) - delays as i see, but not full data in webdriver object too.)

Upadate: It`s strange for me, but using of webdriver.wait, time.sleep(), drver.refresh() do not update drive.page_source of the page. That still stay in not loaded correctly statment... Code:

    self.driver.get(url_)
    time.sleep(15)
    number_of_elements = len(self.driver.find_elements_by_class_name("product-cards-layout__item")) # len -24 

    while True:
        xpath = "//div[@class=\"product-card--mobile\"]"
        condition = EC.presence_of_all_elements_located((By.XPATH, xpath))

        try:
            wait = WebDriverWait(self.driver, 10).until(condition) # len - 6 
        except Exception:
            pass

        print(len(wait)) #6
        if len(wait) == number_of_elements:
            break
        else:
            self.driver.refresh()

    exit_ = self.driver.page_source

So. In driver.page_sorce is html-code bellow:

<div class="product-cards-layout__item"><div class="product-card--mobile__info"</div></div>
<div class="product-cards-layout__item"><div class="product-card--mobile__info"</div></div>
<div class="product-cards-layout__item"><div class="product-card--mobile__info"</div></div>
... (6 times)

<div class="product-cards-layout__item"><!-- --></div>
<div class="product-cards-layout__item"><!-- --></div>
<div class="product-cards-layout__item"><!-- --></div>
... (20 times)
Total 24 TAGS

But in Chrome opened window i see all need information (24 full TAGS, within construction <div class="product-card--mobile__info"). In run script mode, in debugger mode i see static contain of the .page_source... And if i don't use .refresh() all the same - it`s staic, and not correspont to data in the browser. And it still static for the hundrets loops )))

3
A website with such behavior would help. An helper on the page might be present, like a loading spinner to be checked against until it is not present for example. - Nic Laforge

3 Answers

1
votes

If you know the number of elements per page you can use this function to wait until all expected elements:

from selenium.common.exceptions import TimeoutException, StaleElementReferenceException

def wait_until_all_expected_elements(func, number_of_elements, timeout=30):

endtime = time.time() + timeout
while True:
    try:
        if time.time() > endtime:
            raise TimeoutException("The function doesn't return a sufficient number of elements")
        elements = func()
        if len(elements) == number_of_elements:
            return elements
    except StaleElementReferenceException:
        pass

where number_of_elements stands for the number of elements that the page contains. Then, get the elements with WebdriverWait.until

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec

def get_elements(driver):
    wait = WebDriverWait(driver, 10)
    return wait.until(ec.presence_of_all_elements_located((By.XPATH, path_to_element))

and pass the function to wait_until_all_expected_elements as follows:

elements = wait_until_all_expected_elements(lambda: get_elements(driver), number_of_elements)
0
votes

I've had the same issue with web scraping. Try using python's built in time library. You just put time.sleep(number_of_seconds) and the site will have time to load and then you can look for what you need.

import time

driver.get(your_website_here)

time.sleep(5)  # Wait 5 seconds for page to fully load

driver.find_elements_by_class("abcd")
0
votes

My problem was: the site don't load block while it not in focus in browser. Desision is - scroll to all div`s your need:

         self.driver.get(url_)
         product_elements = self.driver.find_elements_by_class_name("product-cards")
 
         for elm in product_elements:
             elm.location_once_scrolled_into_view