1
votes

Context: We are trying to load some CSV format data into GCP BigQuery using GCP Dataflow (Apache Beam). As a part of this for the first time (for each table) creating the BQ tables thru BigQueryIO API. One of the customer requirement is the data on GCP needs to be encrypted using Customer supplied/managed Encryption keys.

Problem Statement: We are not able to find any way to specify the "Custom Encryption Keys" thru APIs while creating Tables. The GCP documentation details about how to specify the Custom encryption keys thru GCP BQ Console but could not find anything for specifying it thru APIs from within DataFlow Code.

Code Snippet:

String tableSpec = new StringBuilder().append(PipelineConstants.PROJECT_ID).append(":")
    .append(dataValue.getKey().target_dataset).append(".").append(dataValue.getKey().target_table_name)
    .toString();

ValueProvider<String> valueProvider = StaticValueProvider.of("gs://bucket/folder/");

dataValue.getValue().apply(Count.globally()).apply(ParDo.of(new RowCount(dataValue.getKey())))
    .apply(ParDo.of(new SourceAudit(runId)));

dataValue.getValue().apply(ParDo.of(new PreProcessing(dataValue.getKey())))
    .apply(ParDo.of(new FixedToDelimited(dataValue.getKey())))
    .apply(ParDo.of(new CreateTableRow(dataValue.getKey(), runId, timeStamp)))
    .apply(BigQueryIO.writeTableRows().to(tableSpec)
        .withSchema(CreateTableRow.getSchema(dataValue.getKey()))
        .withCustomGcsTempLocation(valueProvider)
        .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
        .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));

Query: If anybody could let us know

  • If this is possible to provide encryption key thru Beam API?
  • If its not possible with the current version what could be the possible work around?

Kindly let know if additional information is required.

1
This is a really new feature (still beta). The BigQuery client libs don't even support it, so Beam/Dataflow will not be able to support it yet. Will the tables be known in advance, or are they dynamic?Graham Polley
@ Graham Polley, thanks for the response! the tables will be dynamic. As you mentioned I noticed the GCP documentation mentions BQ Client Lib don't support encryption settings.Para_Conscious
@glytching, Thanks for the sharing the JIRA!Para_Conscious
Would you need a separate encryption key per dynamic table, or one for all of them?Reuven Lax

1 Answers

1
votes

Customer supplied encryption keys is a new feature, not all libraries have been updated to support it yet.

If you know the table name in advance, you can use UI/CLI or API to create table, then run your normal flow to load data into that table. That might be a work around for you.

https://cloud.google.com/bigquery/docs/customer-managed-encryption#create_table

API to create table: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert

You need to set this section on table object: "encryptionConfiguration": { "kmsKeyName": string } More details on table: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#resource