I am running a Google Dataflow job with 150 workers. I am reading the input from Google PubSub. After few enrichment, I am writing the result to Google BigQuery.
For few records I see the below error in Google Dataflow
(787b51f314078308): Exception: java.lang.OutOfMemoryError: Java heap space
java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
...
...
...
com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:188)
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
Stack trace truncated. Please see Cloud Logging for the entire trace.
I am using 150 workers to process ~75K messages per second. And each message is of size ~1.5KB. Should I further increase the no.of workers? Or should I increase the memory of each workers? How can I increase the memory of each workers?