0
votes

I'm using Apache's Beam sdk version 0.2.0-incubating-SNAPSHOT and trying to pull data to a bigtable with the Dataflow runner. Unfortunately I'm getting NullPointerException when executing my dataflow pipeline where I'm using BigTableIO.Write as my sink. Already checked my BigtableOptions and the parameters are fine, according to my needs.

Basically, I create and in some point of my pipeline I have the step to write the PCollection<KV<ByteString, Iterable<Mutation>>> to my desired bigtable:

final BigtableOptions.Builder optionsBuilder =
    new BigtableOptions.Builder().setProjectId(System.getProperty("PROJECT_ID"))
        .setInstanceId(System.getProperty("BT_INSTANCE_ID"));

// do intermediary steps and create PCollection<KV<ByteString, Iterable<Mutation>>> 
// to write to bigtable

// modifiedHits is a PCollection<KV<ByteString, Iterable<Mutation>>>
modifiedHits.apply("writting to big table", BigtableIO.write()
    .withBigtableOptions(optionsBuilder).withTableId(System.getProperty("BT_TABLENAME")));

p.run();

When executing the pipeline, I got the NullPointerException, pointing out exactly to the BigtableIO class at the public void processElement(ProcessContext c) method:

(6e0ccd8407eed08b): java.lang.NullPointerException at org.apache.beam.sdk.io.gcp.bigtable.BigtableIO$Write$BigtableWriterFn.processElement(BigtableIO.java:532)

I checked this method is processing all elements before to write on bigtable, but not sure why I'm getting such exception overtime I execute this pipeline. According to the code below, this method uses bigtableWriter attribute to process each c.element(), but I can't even set a breakpoint to debug where is exactly the null. Any kind of advice or suggestion of how to solve this issue?

@ProcessElement
  public void processElement(ProcessContext c) throws Exception {
    checkForFailures();
    Futures.addCallback(
        bigtableWriter.writeRecord(c.element()), new WriteExceptionCallback(c.element()));
    ++recordsWritten;
  }

Thanks.

1
Could you clarify a few things: 1) What version of the SDK are you using? 2) What runner are you using? (direct runner, Spark, Flink, Dataflow?) If it's Dataflow, could you give the job ID?jkff
@jkff thanks for the comment. Yes, just edited my question, including the versions. So, yes, I'm using Dataflow runner. Its job id is 2016-09-13_08_29_14-14276852956124203982Saulo Ricci
I looked up the job and its classpath, and if I'm not mistaken it looks like you're using version 0.3.0-incubating-SNAPSHOT of beam-sdks-java-{core,io}, but version 0.2.0-incubating-SNAPSHOT of google-cloud-dataflow-java. I believe the issue is because of this - you have to use the same version (more details: BigtableIO in version 0.3.0 uses \@Setup and \@Teardown methods, but runner 0.2.0 does not support them yet).jkff
@jkff exactly that was the problem, just fixed here. Thank you.Saulo Ricci
@jkff – please move your comment into an answer so that it can be accepted and this question can be marked closed. Thank you for solving the problem!Misha Brukman

1 Answers

2
votes

I looked up the job and its classpath, and if I'm not mistaken it looks like you're using version 0.3.0-incubating-SNAPSHOT of beam-sdks-java-{core,io}, but version 0.2.0-incubating-SNAPSHOT of google-cloud-dataflow-java.

I believe the issue is because of this - you have to use the same version (more details: BigtableIO in version 0.3.0 uses @Setup and @Teardown methods, but runner 0.2.0 does not support them yet).