0
votes

I am running the following cloud function. It runs with success and indicates data was loaded to the table. But when I query the BigQuery no data has been added. I am getting no errors and no indication that it isn't working.

from google.cloud import bigquery
import pandas as pd


def download_data(event, context):

     df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/full_data.csv')

     # Create an empty list 
     Row_list =[] 

     # Iterate over each row 
     for index, rows in df.iterrows(): 
          # Create list for the current row 
          my_list =[rows.date, rows.location, rows.new_cases, rows.new_deaths, rows.total_cases, rows.total_deaths] 
          #print(my_list)     
     # append the list to the final list 
     Row_list.append(my_list) 


     ## Get Biq Query Set up
     client = bigquery.Client()
     table_id = "<project_name>.raw.daily_load"
     table = client.get_table(table_id)

     print(client)
     print(table_id)
     print(table)


     errors = client.insert_rows(table, Row_list)  # Make an API request.
     if errors == []:
          print("New rows have been added.")

Attempted so far;

  1. Check data was being pulled -> PASSED, I printed out row_list and data is there
  2. Run locally from my machine -> PASSED, data appeared when I ran it from a python terminal
  3. Print out the table details -> PASSED, see attached screenshot it all appears in the logs
  4. Confirm it is able to find the table -> PASSED, I changed the name of the table to one that didn't exist and it failed

Not sure what is next, any advice would be greatly appreciated

logfile indicating success

1
Are the machine service account used to run this pyhton script the same used to query the data?Claudio
I am querying from the GUI so my own credentials and the cloud function is an app engine default service accountCameron Wasilewsky
How do you query BigQuery? Are you perform a "preview" on the table? Or do you perform a real SQL query?guillaume blaquiere

1 Answers

0
votes

Assuming that the App Engine default service account has the default Editor role assigned and that you have a very simple schema for the BigQuery table. For example:

Field name      Type          Mode       Policy tags    Description
date            STRING        NULLABLE  
location        STRING        NULLABLE  
new_cases       INTEGER       NULLABLE  
new_deaths      INTEGER       NULLABLE  
total_cases     INTEGER       NULLABLE  
total_deaths    INTEGER       NULLABLE

The following modification of your code should work for an HTTP triggered function. Notice that you were not including the Row_list.append(my_list) in the for loop to populate your list with the elements and that according to the samples on the documentation you should be using a list of tuples:

from google.cloud import bigquery
import pandas as pd

client = bigquery.Client()
table_id = "[PROJECT-ID].[DATASET].[TABLE]"

def download_data(request):

     df = pd.read_csv('https://covid.ourworldindata.org/data/ecdc/full_data.csv')

     # Create an empty list 
     Row_list =[] 

     # Iterate over each row 
     for index, rows in df.iterrows(): 
          # Create list for the current row 
          my_list =(rows.date, rows.location, rows.new_cases, rows.new_deaths, rows.total_cases, rows.total_deaths)  
          # append the list to the final list 
          Row_list.append(my_list) 


     ## Get Biq Query Set up
     table = client.get_table(table_id)

     errors = client.insert_rows(table, Row_list)  # Make an API request.
     if errors == []:
          print("New rows have been added.")

With the very simple requirements.txt file:

# Function dependencies, for example:
# package>=version
pandas
google-cloud-bigquery