8
votes

When trying to load a GCS file using a CSEK I get a dataflow error

[ERROR] The target object is encrypted by a customer-supplied encryption key

I was going to try to AES decrypt on the dataflow side, but I see I can't even get the file without passing an encryption key.

Is there another way to load CSEK encrypted Google Cloud Storage files from within dataflow? For example using the google cloud storage api, getting a stream handle then passing that to dataflow?

    // Fails
    p.apply("Read from source", TextIO.read().from("gs://my_bucket/myfile")).apply(..); 
1

1 Answers

7
votes

By the documentation Cloud Dataflow do not currently support objects encrypted with customer-supplied encryption keys. I opened a feature request for this to be implemented.

Note that you can't get a file in Cloud Storage which has been uploaded using customer-supplied encryption key (CSEK) without having that encrypted key.

By the documentation:

If you use customer-supplied encryption keys or client-side encryption, you must securely manage your keys and ensure that they are not lost. If you lose your keys, you are no longer able to read your data, and you continue to be charged for storage of your objects until you delete them.

If we still have the CSE key, sample Java code to access the file is:

byte[] content = storage.readAllBytes(
    bucketName, blobName, BlobSourceOption.decryptionKey(decryptionKey));

All other possible methods of getting file with CSEK are described here.