2
votes

I'm unit testing a ParDo function with Apache Beam that has 1 main output and 1 sideoutput:

public class GetPubsubMessageDoFn extends DoFn<PubsubMessage, PubsubPayload.PubsubPayloadDTO> {

  @ProcessElement
  public void processContext(ProcessContext processContext) {
    PubsubPayload pubsubPayload = new PubsubPayload(processContext.element());
    processContext.output(pubsubPayload.getPayload()); //main output
    processContext.output(ORIGIN_PATH_TUPLE_TAG, GCSUtils.toGSURL(pubsubPayload.getPayload().bucket, pubsubPayload.getPayload().name)); //side output
  }
}

I set up a unit test class for testing the main - and side outputs:

 public class GetPubsubMessageDoFnTest {

      private DoFnTester<PubsubMessage, PubsubPayloadDTO> getPubsubMessageDoFn;   
      private Injector injector;
      private final TupleTagList tags = TupleTagList.of(PUBSUB_PAYLOAD_DTO_TUPLE_TAG).and(ORIGIN_PATH_TUPLE_TAG);


      @Before   
      public void setup() {
        injector = Guice.createInjector(new GetPubsubMessageTestModule());
        this.getPubsubMessageDoFn = DoFnTester.of(injector.getInstance(GetPubsubMessageDoFn.class));
        this.getPubsubMessageDoFn.setOutputTags(tags); //Does not compile
      }

  //Tests

According to the documentation I should be able to set the side output using setOutputTags(tags) only that function does not exist on the DoFnTester class. I'm using the Google Cloud Dataflow dependency version 2.1.0, which does use a subset of Apache Beam's features, but even looking at the Apache Beam reference documentation for DoFnTester setOutputTags isn't listed (even though it's mentioned again in the intro).

1

1 Answers

0
votes

These methods are not available in 2.1.0. In fact, DoFnTester is being deprecated, see https://issues.apache.org/jira/browse/BEAM-3159.

The advice is to use TestPipeline with the DirectRunner to test a ParDo on their DoFn. You can carefully control the flow of input with TestStream. See a nice blog on this topic.