1
votes

Executing DataFlow pipelines, every once in a while we see those Exceptions. Is there anything we can do about them? We have a quite simple flow that reads data from a BigQuery query and populate data in BigTable.

Also what happens to data inside the pipeline? Is it reprocessed? Or is it lost in transit to BigTable?

 CloudBigtableIO.initializeForWrite(p);
     p.apply(BigQueryIO.Read.fromQuery(getQuery()))
     .apply(ParDo.of(new DoFn<TableRow, Mutation>() {
           public void processElement(ProcessContext c) {
             Mutation output = convertDataToRow(c.element());
             c.output(output);
           }

           }))
         .apply(CloudBigtableIO.writeToTable(config));


private static Mutation convertDataToRow(TableRow element) {
     LOG.info("element: "+ element);
     LOG.info("BASM_segment_id: "+ element.get("BASM_segment_id"));
     if(element.get("BASM_AID") != null){
         Put obj = new Put(getRowKey(element).getBytes()).addColumn(SEGMENT_FAMILY, SEGMENT_COLUMN_NAME, ((String)element.get("BAS_category")).getBytes() );
                obj.addColumn(USER_FAMILY, "AID".getBytes(), ((String)element.get("BASM_AID")).getBytes());
         if(element.get("BASM_segment_id") != null){
                obj.addColumn(SEGMENT_FAMILY, "segment_id".getBytes(), ((String)element.get("BASM_segment_id")).getBytes());
         }
         if(element.get("BAS_sub_category") != null){
                obj.addColumn(SEGMENT_FAMILY, "sub_category".getBytes(), ((String)element.get("BAS_sub_category")).getBytes());
         }
         if(element.get("BAS_name") != null){
                obj.addColumn(SEGMENT_FAMILY, "name".getBytes(), ((String)element.get("BAS_name")).getBytes());
         }
         if(element.get("BAS_description") != null){
                obj.addColumn(SEGMENT_FAMILY, "description".getBytes(), ((String)element.get("BAS_description")).getBytes());
         }
         if(element.get("BAS_last_compute_day") != null){obj.addColumn(USER_FAMILY, "Krux_User_id".getBytes(), ((String)element.get("BASM_krux_user_id")).getBytes());
                obj.addColumn(SEGMENT_FAMILY, "last_compute_day".getBytes(), ((String)element.get("BAS_last_compute_day")).getBytes());
         }
         if(element.get("BAS_type") != null){
                obj.addColumn(SEGMENT_FAMILY, "type".getBytes(), ((String)element.get("BAS_type")).getBytes());
         }      
         if(element.get("BASM_REGID") != null){
                obj.addColumn(USER_FAMILY, "REGID".getBytes(), ((String)element.get("BASM_REGID")).getBytes() );
         }
        return obj;
     }else{
         return null;
     }
    }

Following is the exception which we are getting:

2016-08-22T21:47:33.469Z: Error: (84707221e08b977b): java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeExc ption: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: StatusRuntimeException: 1 time, at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:162) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:287) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext.output(DoFnRunnerBase.java:449) at com.nytimes.adtech.dataflow.pipelines.BigTableSegmentData$2.processElement(BigTableSegmentData.java:70) Caused by: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsExcept on: Failed 1 action: StatusRuntimeException: 1 time, at com.google.cloud.dataflow.sdk.util.UserCodeException.wrap(UserCodeException.java:35) at com.google.cloud.dataflow.sdk.util.UserCodeException.wrapIf(UserCodeException.java:40) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.wrapUserCodeException(DoFnRunnerBase.java:368) at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:51) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138) at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:190) at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn. ava:47) at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53) at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:160) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:287) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext.output(DoFnRunnerBase.java:449) at com.nytimes.adtech.dataflow.pipelines.BigTableSegmentData$2.processElement(BigTableSegmentData.java:70) at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138) at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:190) at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn. ava:47) at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53) at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:226) at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:167) at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:71) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:288) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:221) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Caused by:

org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: StatusRuntimeException: 1 time, at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.handleExceptions(BigtableBufferedMutator.java:389) at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.mutate(BigtableBufferedMutator.java:274) at com.google.cloud.bigtable.dataflow.CloudBigtableIO$CloudBigtableSingleTableBufferedWriteFn.processElement(CloudBigtabl IO.java:966)

Exception copied from Dataflow console

(7e75740160102c05): java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: StatusRuntimeException: 1 time, at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:162) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:287) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext.output(DoFnRunnerBase.java:449) at com.nytimes.adtech.dataflow.pipelines.BigTableSegmentData$2.processElement(BigTableSegmentData.java:70) Caused by: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: StatusRuntimeException: 1 time, at com.google.cloud.dataflow.sdk.util.UserCodeException.wrap(UserCodeException.java:35) at com.google.cloud.dataflow.sdk.util.UserCodeException.wrapIf(UserCodeException.java:40) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.wrapUserCodeException(DoFnRunnerBase.java:368) at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:51) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138) at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:190) at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47) at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53) at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:160) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:287) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext.output(DoFnRunnerBase.java:449) at com.nytimes.adtech.dataflow.pipelines.BigTableSegmentData$2.processElement(BigTableSegmentData.java:70) at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:138) at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:190) at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47) at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:53) at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:226) at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:167) at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:71) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:288) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:221) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173) at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: StatusRuntimeException: 1 time, at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.handleExceptions(BigtableBufferedMutator.java:389) at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.mutate(BigtableBufferedMutator.java:274) at com.google.cloud.bigtable.dataflow.CloudBigtableIO$CloudBigtableSingleTableBufferedWriteFn.processElement(CloudBigtableIO.java:966)

2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: org.apache.hadoop.... 2016-08-23 (13:17:54) java.lang.RuntimeException:

Thanks in Advance

1
What version of the client are you using? 0.9.1?Les Vogel - Google DevRel
@LesVogel-GoogleDevRel yes we are using 0.9.1 version for bigtable-hbase-dataflowAmandeep
I've asked someone in engineering to comment - it should be later today.Les Vogel - Google DevRel
Can you search your logs for "exceptions occured during a bulk operation" (sic)? That should give more informative logging about the real problem. RetriesExhaustedWithDetailsException is too generic.Solomon Duskis
You may want to consider: if (output != null) { c.output(output); }; This type of exception can occur with null values.Solomon Duskis

1 Answers

1
votes

We spoke offline. The problem here is that you have too many Dataflow workers compared to the number of Cloud Bigtable nodes in your cluster. You need to change that ratio by either reducing Dataflow workers or contacting our team to increase your Cloud Bigtable resources.

Bigtable was performing admirably relative to the amount of Cloud Bigtable Nodes you had, but the load from Dataflow was too high to reliably handle.

You can view your usage in the "CPU Usage" graph in the Google Cloud console. Anything over 80% of your capacity is likely to cause problems. If you get more Bigtable Quota, you can increase the number of nodes you have before you run the Dataflow job, and reduce it after the job is done. For example, Scio does that.

==

Regarding "Also what happens to data inside the pipeline? Is it reprocessed? Or is it lost in transit to BigTable?":

Dataflow tries to send the data to Bigtable again. In those cases, Dataflow's retry mechanism will correct for temporary issues.

Unfortunately, when the problem turns out to be Cloud Bigtable overload, the retries compound the problem by sending more traffic to Bigtable, thereby exacerbating the problem.