0
votes

I have pulling scenario,

HTTP -> Kafka -> Flink -> some output

If im not wrong i can use kafka consumer on stream only ?

Therefor i need to "block" the stream in order to sum/count the data im receiving from the HTTP call .

The easiest way to "block" is to add window/.

What is the best approach for this pulling scenario .

UPDATE

I want to prevent from the collector to sum each value

SingleOutputStreamOperator<Tuple2<String, Integer>> t = 
        in.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String s, Collector<Tuple2<String, Integer>> 
            collector) throws Exception {
                ObjectMapper mapper = new ObjectMapper();
                JsonNode node = mapper.readTree(s);
                node.elements().forEachRemaining(v -> {
                    collector.collect(new Tuple2<>(v.textValue(), 1));
                });

            }
        }).keyBy(0).sum(1);
1
What do you mean with block the stream? Do you want to block your HTTP requests? - twalthr
@twalthr updated the question - MIkCode
Did you take a look into ProcessFunction? You can e.g. collect records in Flink's state and set a timer when to aggregate the data and finally emit it to the next operator. - twalthr

1 Answers

0
votes

If I understand correctly I think what you may want to use is a session window. This will continue to collect messages into the window and will only process the contents of the window when an event hasn't been received after a certain amount of time. See the documentation on session windows here: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html