0
votes

It is the first time I am using a Cloud Function and this cloud function is just doing one single job: everytime a file is uploaded to a GCS bucket, the cloud function is running and copying that file (.csv) to a BigQuery table without any transformations. What would be the most efficient wat to test (unit not integration) the gcs_to_bq method?

def get_bq_table_name(file_name):
    if re.search('car', file_name):
        return 'car'
    return 'bike'

def gcs_to_bq(event, context):

    # Construct a BigQuery client object.
    client = bigquery.Client()

    bq_table = get_bq_table_name(event['name'] )

    table_id = f'xxx.yyy.{bq_table}'
    
    job_config = bigquery.LoadJobConfig(
        schema=[
            bigquery.SchemaField("datetime", "STRING"),
            bigquery.SchemaField("name", "STRING"),
            bigquery.SchemaField("id", "STRING"),

        ],
        skip_leading_rows=1,
        # The source format defaults to CSV, so the line below is optional.
        source_format=bigquery.SourceFormat.CSV,
    )
    
    uri = "gs://" + event['bucket'] + '/' + event['name'] 

    load_job = client.load_table_from_uri(
        uri, table_id, job_config=job_config
    )  # Make an API request.

    load_job.result()  # Waits for the job to complete.

    destination_table = client.get_table(table_id)  # Make an API request.
    print("Loaded {} rows.".format(destination_table.num_rows))
1

1 Answers

1
votes

I think you would need three things for unit testing

  1. create a fake .cvs file and upload it to a stage/dev GCS bucket.
  2. create a staging dataset in BQ.
  3. create a fake event object that presents (1).

Then your unit testing is to call gcs_to_bq() with (3) and check if the table is created correctly in (2).

As you could see, though it's unit testing, it requires setting up cloud resources. There is a GCS emulator that could help if you want to create GCS stub/mock completely local but I never tried it. https://github.com/fsouza/fake-gcs-server