I have started learning stream processing very recently and am trying my hand at Apache Flink. I'm trying to write a job that reads events from a Kafka topic, possibly performs some stateless chained transformations and makes a REST call to another application to POST each transformed event. For example, my main method could look like this -
public class KafkaSourceToRestSinkJob {
public static void main(String[] args) {
String configPath = args[0];
//Read configuration for the job (like kafka properties, rest uri for sink, possibly operators to invoke)
...
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> dataStream = env.addSource(new FlinkKafkaConsumer<>(topic, new SimpleStringSchema(), kafkaProps));
dataStream.addSink(new RestSinkFunction<>()); //Custom sink function implementing org.apache.flink.streaming.api.functions.sink.SinkFunction
//Chain some operators depending on some parameters in the config file
...
env.execute("Confused Job");
}
}
My aim is to have a common jar artifact for multiple jobs with the same type of source and sink. If I need a job to perform transformations A, B & C (implementations will be present in the jar), I can specify them in the config file and pass the path to the file in the program args.
Now here are my questions -
- Is it possible to dynamically invoke operators?
- I know that making a REST call in the sink may cause some unwanted latency, but in my application, it is tolerable. I also do not care about the response. Keeping this in mind, is there a reason why I should avoid a REST sink?
- Overall, am I going horribly wrong?
Thank you!