0
votes

I need to do big queries from AWS Athena (500K+). If my query result have X rows and I use fetchmany to get Y rows at a time until I get all of my rows is it connecting to the db for every use of fetchmany? I ask because in Athena every query cost money and i prefer to do one big query instead of multiple smaller ones. Thanks.

def get_df
    while sum_fetch < limit
        batch_result = pd.DataFrame(cursor.fetchmany(FETCH_SIZE), columns=COLUMN_NAMES)
        df = df.append(pd.DataFrame(batch_result))
        sum_fetch += FETCH_SIZE
    return df

if someone has a nicer way to do it it's also be great (i thought of maybe using a decorator to do the sum)

1
Show us your codeRoman Konoval
@RomanKonoval Here you gonivsk98

1 Answers

1
votes

SQL engines have a true notion of cursor. When they recieve a query, they do their internal housekeeping, and as soon as they are ready to return rows, they return a cursor.

The client then uses that cursor to retrieve rows on the same query, and the engine uses again some internal houskeeping to remember what has already been sent, and what is still to send.

In the end, the client part has sent one single query and a number of fetch to retrieve the results. This is quite different than sending multiple queries.

If you are charged by queries, then you should be charged for one single query, if there is a part about the volume then that part would grow with the number of rows. But not the query part.