0
votes

I an using a function (movies_from_url) to read movies total 256 from a webpage. Each page contains 50 movies. I have to read first 6 pages for this (5 pages for 250 movies and 6th page for 6 movies).

first url:

http://www.imdb.com/search/title?at=0&sort=user_rating&start=1&title_type=feature&year=2005,2014

Here is my vague idea:

def read_m_by_rating(first_year=2005, last_year=2015, top_number=256):
    current_index=1   # current index is start number  of a webpage 
    final_list = []
    for _ in xrange(6):
    url = http://www.imdb.com/search/title?at=0&sort=user_rating&start=current_index&title_type=feature&year=2005,2014
    if top_number==300:
         lis = movies_from_url(url, top_number - current_index + 1)
    else:
         lis = movies_from_url(url, 50)

    final_list.append(lis)
    current_index=+50
    return final_list
1
Which difficulty are you having? Strange code, btw. Try yourself and then ask. We're not here to write full programs for you.ForceBru
@ ForceBru, to create each urls.Alph
you're talking about for loop here to create url: ?ForceBru
I think it's a good question. He did provide pseudo code that proves he did some thinking. My suggestion to you is to try and break this into challanges one by one. For now just try and master for loops. You may want to google "loop comprehension". (leave aside the specfics of dynamic-content crawling for now).Reut Sharabani
Just loop through start as this: for o in xrange(20): a_url="http://url.com/?bla=23&start="+str(o)+"&blabla=32" and use a_url thenForceBru

1 Answers

1
votes

Just using a simple loop over current_index should work.

while current_index<256:
    url = "http://www.imdb.com/search/title?at=0&sort=user_rating&start="\
    +str(current_index)+"&title_type=feature&year=2005,2014"
    ...
    ...
    current_index+=50
return final_list