How does apache beam access bigtable data?

Question

If BigtableIO.Read is run in dataflow, is the data being accessed via a bigtable node or going directly to bigtable tablets?

Bigtable architecture has:

client requests go through a front-end server before they are sent to a Cloud Bigtable node

and goes on to say:

A Cloud Bigtable table is sharded into blocks of contiguous rows, called tablets to help balance the workload of queries... Tablets are stored on Colossus, Google's file system, in SSTable format

(The concern is if there is a dataflow job running at the same as users are making individual request that definitely go through the nodes, whether there will be a small or large amount of contention from the dataflow job. I would guess that if the dataflow job went through the nodes there would be significantly more contention as opposed to hitting the tablets directly.)

chamikara chamikara · Accepted Answer · 2021-01-25T17:54:31

Beam BigTable connector uses the Cloud BigTable's public API hence requests will be going through the BigTable front end server nodes.

See here for bit more detail regarding BigTable client API usage of the Beam connector.

How does apache beam access bigtable data?

1 Answers