0
votes

I am making a POST request to a website which contains the store name, street and city for a number of stores, each in their own card on the webpage. I'm trying to record that data from the site using xpath. There are a small number of items (e.g. a store name) that cannot be read, generating an IndexError, and I am trying to handle those errors with a try-except.

In the below code, there is only an error when reading a single title variable, and appending that to a name list. My code catches the exception, but for some reason this 'X_NAME_ERROR_X' element is appended to the end of the list - e.g. ['place1', 'place 2', 'place 4', 'X_NAME_ERROR_X'], when I know that the exception is occurring in what would be 'place3'.

Is there a reason why python would append the excepted code variable at the end of the list, even thought the exception should have been raised before the end of the for loop?

rest_count = len(response.html.xpath('//*[@id="search-results-container"]//div[@class="mb-32 width--full"]'))

names = []
street_address = []
city_address = []
for item in range(rest_count):
    try:
        title = response.html.xpath('//*[@id="search-results-container"]//div[@class="mb-32 width--full"]/h4/text()')[item]
        names.append(title)
    except IndexError:
        title = 'X_NAME_ERROR_X'
        names.append(title)
    try:
        street = response.html.xpath('//*[@id="search-results-container"]//div[@class="mb-32 width--full"]/p[1]/text()')[item]
        street_address.append(street)
    except IndexError:
        street = 'X_STREET_ERROR_X'
        street_address.append(street)
    try:
        city = response.html.xpath('//*[@id="search-results-container"]//div[@class="mb-32 width--full"]/p[2]/text()')[item]
        city_address.append(city)
    except IndexError:
        city = 'X_CITY_ERROR_X'
        city_address.append(city)
1
"when I know that the exception is occurring in what would be 'place3'" - sounds like you're wrong about that.user2357112 supports Monica
Actually, there's no way you could get an IndexError between two valid indices.user2357112 supports Monica
@user2357112supportsMonica So it's always just going to give that error at the end because the index is non-existent? Even though the length of the rest_count object is the full number of stores that show any data on the webpage?F McA
The xpath selector you used to compute rest_count selects more things than the one you used for title.user2357112 supports Monica
@user2357112supportsMonica I see. Because of the additional "/h4"?F McA

1 Answers

1
votes

The data structure you're trying to index goes [thing1, thing2, thing4], not [thing1, thing2, some_magic_thing_that_raises_an_IndexError, thing4]. Indices 0, 1, and 2 are valid, but index 3 is out of range, so you get an IndexError at the end. You may have expected to have an extra thing between things 2 and 4, but that's not where the IndexError happens.