0
votes

My team is considering exporting/importing bigtable data, but just a subset of column families (not all) for technical issues.

These docs seem to suggest that export/import is possible only for the entire bigtable.

https://cloud.google.com/bigtable/docs/exporting-sequence-files https://cloud.google.com/bigtable/docs/importing-sequence-files

We want to avoid exporting the entire bt due to its size (and we only need specific column families to be backed-up/restored).

Perhaps there is an easy solution to this that we are not aware of, so I thought I would ask here.

Another approach we thought of was to use Dataflow to scan specific column families, dump the data to GCS (by means of keeping a snapshot), and recover the data when it's needed. But if we don't need to use Dataflow, that will be a better solution for us.

Any help and suggestions will be really appreciated!

1

1 Answers

0
votes

You can currently set a single column family via -Dhbase.mapreduce.scan.column.family=<your_family> on the command line.

We're currently working on improving our Import / Export process. If you'd like, you can request an enhancement in our github issues, and we should be able to get to it fairly quickly. The following should be relatively easy to implement, but is not currently implemented:

-Dhbase.mapreduce.scan.column.families=<your families separated by commas>