I am exporting a table in Cloud Bigtable to Cloud Storage by following this link https://cloud.google.com/bigtable/docs/exporting-sequence-files#exporting_sequence_files_2
The bigtable table size is ~300GB and the dataflow pipeline results in this error
An OutOfMemoryException occurred. Consider specifying higher memory instances in PipelineOptions.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)...
and the error suggests to increase the memory of instance type used for the Dataflow job. I also received a warning saying
Worker machine type has insufficient disk (25 GB) to support this type of Dataflow job. Please increase the disk size given by the diskSizeGb/disk_size_gb execution parameter.
I re-checked the command to run the pipeline here (https://github.com/googleapis/cloud-bigtable-client/tree/master/bigtable-dataflow-parent/bigtable-beam-import) and tried to look for any command line option which helps me to set custom instance type or PD size for the instance but couldn't find any.
By default the instance type is n1-standard-1 and PD Size is 25GB.
Is there any parameter to pass during job creation which would help me to escape this error? If yes, what are they?