How to extract data from all urls, not just the first

Question

This script is generating a csv with the data from only one of the urls fed into it. There are meant to be 98 sets of results, however the for loop isn't getting past the first url.

I've been working on this for 12hrs+ today, what am I missing in order get the correct results?

import requests import re from bs4 import BeautifulSoup import csv

#Read csv
csvfile = open("gyms4.csv")
csvfilelist = csvfile.read()

def get_page_data(urls):
    for url in urls:
        r = requests.get(url.strip())
        soup = BeautifulSoup(r.text, 'html.parser')
        yield soup    # N.B. use yield instead of return

print r.text

with open("gyms4.csv") as url_file:
    for page in get_page_data(url_file):
        name = page.find("span",{"class":"wlt_shortcode_TITLE"}).text
        address = page.find("span",{"class":"wlt_shortcode_map_location"}).text
        phoneNum = page.find("span",{"class":"wlt_shortcode_phoneNum"}).text
        email = page.find("span",{"class":"wlt_shortcode_EMAIL"}).text

        th = pages.find('b',text="Category")
        td = th.findNext()
        for link in td.findAll('a',href=True):
            match = re.search(r'http://(\w+).(\w+).(\w+)', link.text)
            if match:
                web_address = link.text

gyms = [name,address,phoneNum,email,web_address]
gyms.append(gyms)

#Saving specific listing data to csv
with open ("xgyms.csv", "w") as file:
    writer = csv.writer(file)
    for row in gyms:
        writer.writerow([row])

Psytho Psytho · Accepted Answer · 2015-09-29T07:43:58

You have 3 for-loops in your code and do not specifiy which one causes problem. I assume it is the one in get_page_date() function.

You leave the looop exactly in the first run with the return assignemt. That is why you never get to the second url.

There are at least two possible solutions:

Append every parsed line of url to a list and return that list.
Move you processing code in the loops and append the parsed data to gyms in the loop.

How to extract data from all urls, not just the first

2 Answers