1
votes

With the stream API, I can write a RichCoFlatMapFunction that accept a control stream and a data stream, the control stream contains the elements for start or stop or change parameter of the calculation, I know I can store the current control settings in states, and check the value when process data stream.

But what's the way to do the similar thing with Flink SQL? I cannot use join as data stream and control stream is not able to join together.

The solution we come up with is to store the control settings by application itself. The idea is:

  1. Broadcast the control stream to a map operator, and store the control settings to a java singleton objects in its map() method, as the map operator will run with the default parallelism, we assume that it will run on all JVMs for that job, so that we make sure every JVM will initialize and keep updating the control settings in the singleton object.

  2. With SQL, for every UDAF or UDF we can access the control settings through accessing the java singleton objects.

But I am not sure if my assumption is correct and this is a feasible solution.

1

1 Answers

1
votes

I don't think that is a good idea. SQL was not designed for such use cases. Instead a SQL query is optimized and executed as specified. Changing the behavior of a query is not intended. Besides the design perspective, it would also not perform well because you would need do remote state look-ups to distributed queryable state for each record that you process. This adds of course latency.

To me your use case sounds more like an application than SQL query. For that the DataStream API would be the right choice. What you can do, is to embed SQL (or Table API) queries into an application, i.e., do the pre and post processing with SQL and have an operator with an control/data stream pattern in the middle.