Fetching data from bigquery without repetition

Question

Using Google's bigquery Python API, is it possible to fetch data from from big query table (GCP) in batches without repetition (i.e., downloading a large dataset in small batches rather than all at once)?

For example, if I have a table with 10 million rows, can I run 10 iterations of data fetching where in each iteration 1 million unique and new rows are downloaded without repetition (i.e., the same row is fetched only once across all 10 iterations)?

I can see this doc might be help you to fetch the distinct data from a set of records — Mahboob
@Mahboob, my primary goal is to download the entire dataset in batches rather than all at once. — thereandhere1

Pablo Bustamante Pablo Bustamante · Accepted Answer · 2021-02-06T01:37:23

i use pandas for that things

import pandas as pd
import numpy as np
from google.oauth2 import service_account
import pandas_gbq
credentials = service_account.Credentials.from_service_account_file('yourkey.json')
10MrowsQuery = f'select * from 10MrowTable")'
dataframe = pd.read_gbq(10MrowsQuery, project_id="yourgcpprojectname", dialect='standard', credentials=credentials)

Fetching data from bigquery without repetition

1 Answers