5
votes

I am running a streaming Apache beam pipeline in Google Dataflow. It's reading data from Kafka and streaming insert to Bigquery.

But in the bigquery streaming insert step it's throwing a large number of warning -

    java.lang.RuntimeException: ManagedChannel allocation site
at io.grpc.internal.ManagedChannelOrphanWrapper$ManagedChannelReference.<init> (ManagedChannelOrphanWrapper.java:93)
at io.grpc.internal.ManagedChannelOrphanWrapper.<init> (ManagedChannelOrphanWrapper.java:53)
at io.grpc.internal.ManagedChannelOrphanWrapper.<init> (ManagedChannelOrphanWrapper.java:44)
at io.grpc.internal.ManagedChannelImplBuilder.build (ManagedChannelImplBuilder.java:612)
at io.grpc.internal.AbstractManagedChannelImplBuilder.build (AbstractManagedChannelImplBuilder.java:261)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createSingleChannel (InstantiatingGrpcChannelProvider.java:340)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.access$1600 (InstantiatingGrpcChannelProvider.java:73)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider$1.createSingleChannel (InstantiatingGrpcChannelProvider.java:214)
at com.google.api.gax.grpc.ChannelPool.create (ChannelPool.java:72)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.createChannel (InstantiatingGrpcChannelProvider.java:221)
at com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.getTransportChannel (InstantiatingGrpcChannelProvider.java:204)
at com.google.api.gax.rpc.ClientContext.create (ClientContext.java:169)
at com.google.cloud.bigquery.storage.v1beta2.stub.GrpcBigQueryWriteStub.create (GrpcBigQueryWriteStub.java:138)
at com.google.cloud.bigquery.storage.v1beta2.stub.BigQueryWriteStubSettings.createStub (BigQueryWriteStubSettings.java:145)
at com.google.cloud.bigquery.storage.v1beta2.BigQueryWriteClient.<init> (BigQueryWriteClient.java:128)
at com.google.cloud.bigquery.storage.v1beta2.BigQueryWriteClient.create (BigQueryWriteClient.java:109)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.newBigQueryWriteClient (BigQueryServicesImpl.java:1255)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.access$800 (BigQueryServicesImpl.java:135)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.<init> (BigQueryServicesImpl.java:521)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.<init> (BigQueryServicesImpl.java:449)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.getDatasetService (BigQueryServicesImpl.java:169)
at org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.flushRows (BatchedStreamingWrite.java:374)
at org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite.access$800 (BatchedStreamingWrite.java:69)
at org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements.finishBundle (BatchedStreamingWrite.java:271)
at org.apache.beam.sdk.io.gcp.bigquery.BatchedStreamingWrite$BatchAndInsertElements$DoFnInvoker.invokeFinishBundle (Unknown Source)
at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.finishBundle (SimpleDoFnRunner.java:242)
at org.apache.beam.runners.dataflow.worker.SimpleParDoFn.finishBundle (SimpleParDoFn.java:432)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish (ParDoOperation.java:56)
at org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute (MapTaskExecutor.java:103)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process (StreamingDataflowWorker.java:1430)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1100 (StreamingDataflowWorker.java:165)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$7.run (StreamingDataflowWorker.java:1109)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
at java.lang.Thread.run (Thread.java:748)

I am using apache beam java sdk 2.29.0.

Any clue what is causing the issue?

2
This seems likely to be a bug in either Beam or one of its dependencies, and does not seem like a user error. Here's a similar bug report on github, for example. Reporting this as a bug on the Apache Beam Jira may be helpful, especially if you have instructions for replicating the issue.Daniel Oliveira

2 Answers

6
votes

I had the same problem with an Apache Beam pipeline reading from Pub/Sub and streaming to BigQuery. I was able to "solve" it by downgrading to version 2.28.0 of the Apache Beam java SDK. It seems like the problem was introduced in version 2.29.0 of the SDK and is still present in 2.30.0.

0
votes

A fix for this problem was just merged in to Beam, and should hopefully be released in the next Beam version.