1
votes

I need help on Flink application deployment on K8

we have 3 source that will send trigger condition as in form of SQL queries. Total queries ~3-6k and effectively a heavy load on flink instance. I try to execute but it was very slow and takes lot of time to start.

Because of high volume of queries, we decide to create multiple flink app instance per source. so effectively one flink instance will execute ~1-2K queries only.

example: sql query sources are A, B, C

Flink instance:

App A --> will be responsible to handle source A queries only

App B --> will be responsible to handle source B queries only

App C --> will be responsible to handle source C queries only

I want to deploy these instances on Kubernetes

Question:

a) is it possible to deploy standalone flink jar with mini cluster (inbuilt)? like just start main method: Java -cp mainMethod (sourceName is command line argument A/B/C).

b) if k8's one pod or flink instance is down then how we can manage it in another pod or another flink instance? is it possible to give the work to other pod or other flink instance?

sorry If I mixed up two or more things together :(

Appreciate your help. thanks

1
Normally you'd handle this "heavy load" issue by increasing the parallelism of the source function, and (as needed) figuring out how to partition the queries so that each sub-source has effectively the same load. Why isn't that an option for you?kkrugler
thanks for your suggestions , so lets say I partitioned my rules (sql query) based on "source" to distribute the load on task. if there is any change in sql query that would mean restart the complete flink instance. right? if yes then it is a downtime for other sources that is something we are trying to avoid by creating multiple instance based on "source".Ashutosh

1 Answers

0
votes

Leaving aside issues of exactly-once semantics, one way to handle this would be to have a parallel source function that emits the SQL queries (one per sub-task), and a downstream FlatMapFunction that executes the query (one per sub-task). Your source could then send out updates to the query without forcing you to restart the workflow.