2
votes

Hello i was just wondering i'm trying to create a python application that downloads files from the internet but at the moment it only downloads one file with the name i know... is there any way that i can get a list of files in an online directory and downloaded them? ill show you my code for downloading one file at a time, just so you know a bit about what i wan't to do.

import urllib2

url = "http://cdn.primarygames.com/taxi.swf"

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)

file_size_dl = 0
block_sz = 8192
while True:
    buffer = u.read(block_sz)
    if not buffer:
        break

    file_size_dl += len(buffer)
    f.write(buffer)
    status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
    status = status + chr(8)*(len(status)+1)
    print status,

f.close()

So what is does is it downloads taxi.swf from this website but what i want it to do is to download all .swf's from that directory "/" to the computer?

Is it possible and thank you so much in advanced. -Terrii-

1
Does the CDN provide a listing? If not, your best bet is to crawl the site's webpage and extract links, then download the game from the CDN.nhahtdh
If is not yours, then it's up to the server whether or not to provide indexing. If not, then @nhahtdh is right, that's about all you can do.John
ok thanks and how do i crawl the websites i have read up on a bit of it but couldn't seem to work it out?Terrii

1 Answers

6
votes

Since you're trying to download a bunch of things at once, start by looking for a site index or a webpage that neatly lists everything you want to download. The mobile version of the website is usually lighter than the desktop and is easier to scrape.

This website has exactly what you're looking for: All Games.

Now, it's really quite simple to do. Just, extract all of the game page links. I use BeautifulSoup and requests to do this:

import requests
from bs4 import BeautifulSoup

games_url = 'http://www.primarygames.com/mobile/category/all/'

def get_all_games():
    soup = BeautifulSoup(requests.get(games_url).text)

    for a in soup.find('div', {'class': 'catlist'}).find_all('a'):
        yield 'http://www.primarygames.com' + a['href']

def download_game(url):
    # You have to do this stuff. I'm lazy and won't do it.

if __name__ == '__main__':
    for game in get_all_games():
        download_game(url)

The rest is up to you. download_game() downloads a game given the game's URL, so you have to figure out the location of the <object> tag in the DOM.