9
votes

I am trying to create a JUnit test for a Flink streaming job which writes data to a kafka topic and read data from the same kafka topic using FlinkKafkaProducer09 and FlinkKafkaConsumer09 respectively. I am passing a test data in the produce:

DataStream<String> stream = env.fromElements("tom", "jerry", "bill");

And checking whether same data is coming from the consumer as:

List<String> expected = Arrays.asList("tom", "jerry", "bill");
List<String> result =  resultSink.getResult();
assertEquals(expected, result);

using TestListResultSink.

I am able to see the data coming from the consumer as expected by printing the stream. But could not get the Junit test result as the consumer will keep on running even after the message finished. So it did not come to test part.

Is thre any way in Flink or FlinkKafkaConsumer09 to stop the process or to run for specific time?

4

4 Answers

9
votes

The underlying problem is that streaming programs are usually not finite and run indefinitely.

The best way, at least for the moment, is to insert a special control message into your stream which lets the source properly terminate (simply stop reading more data by leaving the reading loop). That way Flink will tell all down-stream operators that they can stop after they have consumed all data.

Alternatively, you can throw a special exception in your source (e.g. after some time) such that you can distinguish a "proper" termination from a failure case (by checking the error cause). Throwing an exception in the source will fail the program.

1
votes

Can you not use isEndOfStream override within the Deserializer to stop fetching from Kafka? If I read correctly, the flink/Kafka09Fetcher has the following code in its run method which breaks the event loop

    if (deserializer.isEndOfStream(value)) {
                        // end of stream signaled
                        running = false;
                        break;
                    }

My thought was to use Till Rohrmann's idea of a control message in conjunction with this isEndOfStream method to tell the KafkaConsumer to stop reading.

Any reason that will not work? Or maybe some corner cases I'm overlooking?

https://github.com/apache/flink/blob/07de86559d64f375d4a2df46d320fc0f5791b562/flink-connectors/flink-connector-kafka-0.9/src/main/java/org/apache/flink/streaming/connectors/kafka/internal/Kafka09Fetcher.java#L146

0
votes

Following @TillRohrman

You can combine the special exception method and handle it in your unit test if you use an EmbeddedKafka instance, and then read off the EmbeddedKafka topic and assert the consumer values.

I found https://github.com/asmaier/mini-kafka/blob/master/src/test/java/de/am/KafkaProducerIT.java to be extremely useful in this regard.

The only problem is that you will lose the element that triggers the exception but you can always adjust your test data to account for that.

0
votes

In your test you can start job execution in a separate thread, wait some time allowing it for data processing, cancel the thread (it will interrupt the job) and the make the assrtions.

CompletableFuture<Void> handle = CompletableFuture.runAsync(() -> {
    try {
        environment.execute(jobName);
    } catch (Exception e) {
        e.printStackTrace();
    }
});
try {
    handle.get(seconds, TimeUnit.SECONDS);
} catch (TimeoutException e) {
    handle.cancel(true); // this will interrupt the job execution thread, cancel and close the job
}

// Make assertions here