0
votes

I have a DataStream<Tuple2<String, Double>> one and DataStream<Tuple2<String, Double>> second, where the first one has much more elements from another and they have different keys. Moreover, Datastream "two" has basically one key-value pair. So, I want to connect these streams in order to divide the values of the first datastream with the constant value of second datastream. How can this be done in Apache Flink? Does this be done with connected datastreams or is an another way?

1
Datastream "two" has basically one key-value pair, do You mean that this is basically one element stream ?Dominik Wosiński
Yes, exactly. For example, the second datastream is only the key-value pair, ("gamma", 1.23).ChackM
Okay, and what is the assumption for the connect? Do You simply want to enrich elements of one stream with elements from the other ?Dominik Wosiński
Yes, because I want to operate a devition, between the second and the fourth tuple of the new connected stream. I suppose that after the connection i have a Tuple4<String, Double, String, Double>. Is that right?ChackM

1 Answers

1
votes

In the described case the best idea is to simply use the broadcast state pattern. The second stream with few elements would become a broadcast stream and the first one with more elements would be then enriched with elements of the second one. So, You would have something like:

//define broadcast state here

firstStream.keyBy([someKey])
.connect(secondStream.broadcast([mapStateDescriptor])
.process([YourProcessFunction])

And then in Your process function for the process element You could do the enrichment to produce the expected tuple.

More on the broadcast pattern can be found here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html