Facing OutOfMemoryException while exporting bigtable tables to google cloud storage

Question

I am exporting a table in Cloud Bigtable to Cloud Storage by following this link https://cloud.google.com/bigtable/docs/exporting-sequence-files#exporting_sequence_files_2

The bigtable table size is ~300GB and the dataflow pipeline results in this error

An OutOfMemoryException occurred. Consider specifying higher memory instances in PipelineOptions.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)...

and the error suggests to increase the memory of instance type used for the Dataflow job. I also received a warning saying

Worker machine type has insufficient disk (25 GB) to support this type of Dataflow job. Please increase the disk size given by the diskSizeGb/disk_size_gb execution parameter.

I re-checked the command to run the pipeline here (https://github.com/googleapis/cloud-bigtable-client/tree/master/bigtable-dataflow-parent/bigtable-beam-import) and tried to look for any command line option which helps me to set custom instance type or PD size for the instance but couldn't find any.

By default the instance type is n1-standard-1 and PD Size is 25GB.

Is there any parameter to pass during job creation which would help me to escape this error? If yes, what are they?

Manan kshatriya Manan kshatriya · Accepted Answer · 2019-06-05T19:59:48

I found the parameters to select custom PD size and instance type. It is

--diskSizeGb=[Disk_size_in_GBs] --workerMachineType=[GCP_VM_machine_type]

For my case I used

--diskSizeGb=100 --workerMachineType=n1-highmem-4

These parameters are part of PipelineOptions class for defining execution time parameters. You can refer more parameters here https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.html

But since I had set --maxNumWorkers to 30 for autoscaling I ran into some Quota issues which will prevent your job from autoscaling and will be slowed down but no errors.

Facing OutOfMemoryException while exporting bigtable tables to google cloud storage

1 Answers