There's a game named guild wars 2 and it gives us APIs to query almost everything in the game database. My aim is using python asyncio and aiohttp to write a simple crawler and get all the items' info from guild wars 2 game database.
I write a short program, it's work, but it behaves kind of weird, I guess here's something I don't understand about composing the coroutine.
First, I made a request with the Postman app. And, in the response header, there's X-Rate-Limit-Limit, 600. So I guess requests are limited at 600 per minute?
here's my question.
1、After the program finished. I checked some JSON file and they have the same content
[{"name": "Endless Fractal Challenge Mote Tonic", "description": "Transform into a Challenge Mote for 15 minutes or until hit. You cannot move while transformed."......
which means the request got a bad response, but I don't know why.
2、I tried asyncio.Semaphore, but even I limit concurrency at 5, the request goes beyond 600 very soon. So I tried to control time by add a time.sleep(0.2) at the end of request_item function. I guess the time.sleep(0.2) will suspend the whole python process for 0.2 seconds, and actually, it worked, but after executing for some time the program hangs for a long time and then gave out a lot of failed attempts. Every automatic retry still failed. I'm confused about this behavior.
async def request_item(session, item_id):
req_param_item = req_param
req_param_item['ids'] = item_id
# retry for 3 times when exception occurs.
for i in range(3):
try:
async with session.get(url_template, params=req_param_item) as response:
result = await response.json()
with open(f'item_info/{item_id}.json', 'w') as f:
json.dump(result, f)
print(item_id, 'done')
break
except Exception as e:
print(item_id, i, 'failed')
continue
time.sleep(0.2)
When I move time.sleep(0.2) into for loop inside request_item function, the whole program hangs. I have no idea what was happening.
async def request_item(session, item_id):
req_param_item = req_param
req_param_item['ids'] = item_id
for i in range(3):
try:
time.sleep(0.2)
async with session.get(url_template, params=req_param_item) as response:
result = await response.json()
with open(f'item_info/{item_id}.json', 'w') as f:
json.dump(result, f)
print(item_id, 'done')
break
except Exception as e:
print(item_id, i, 'failed')
continue
could anyone explain this a little? And is there a better solution? I thought there are some solutions, but I can't test it. like, get the loop.time(), and suspend the whole event loop for every 600 requests. Or, add 600 requests to task_list and gather them as a group, after it's done, asyncio.run(get_item(req_ids)) again with another 600 requests.
here's all of my code.
import aiohttp
import asyncio
import httpx
import json
import math
import os
import time
tk = 'xxxxxxxx'
url_template = 'https://api.guildwars2.com/v2/items'
# get items list
req_param = {'access_token': tk}
item_list_resp = httpx.get(url_template, params=req_param)
items = item_list_resp.json()
async def request_item(session, item_id):
req_param_item = req_param
req_param_item['ids'] = item_id
for i in range(3):
try:
async with session.get(url_template, params=req_param_item) as response:
result = await response.json()
with open(f'item_info/{item_id}.json', 'w') as f:
json.dump(result, f)
print(item_id, 'done')
break
except Exception as e:
print(item_id, i, 'failed')
continue
# since the game API limit requests, I think it's ok to suspend program for a while
time.sleep(0.2)
async def get_item(item_ids: list):
task_list = []
async with aiohttp.ClientSession() as session:
for item_id in item_ids:
req = request_item(session, item_id)
task = asyncio.create_task(req)
task_list.append(task)
await asyncio.gather(*task_list)
asyncio.run(get_item(req_ids))