I am building a simple data pipeline for learning purposes. I have real time data coming from Kafka, I would like to do some transformations using Flink.
Unfortunately, I'm not sure if I understand correctly deployment options. In the the Flink docs I have found section about Docker Compose and application mode. It says that I can deploy only one job to the Flink:
A Flink Application cluster is a dedicated cluster which runs a single job. In this case, you deploy the cluster with the job as one step, thus, there is no extra job submission needed.
The job artifacts are included into the class path of Flinkās JVM process within the container and consist of:
- your job jar, which you would normally submit to a Session cluster and
- all other necessary dependencies or resources, not included into Flink.
To deploy a cluster for a single job with Docker, you need to
- make job artifacts available locally in all containers under /opt/flink/usrlib,
- start a JobManager container in the Application cluster mode
- start the required number of TaskManager containers.
On the other hand, I found examples on github using flink-java artifact, without running any docker image.
What is the difference and why the second option in not mentioned in Flink docs?
And, is it possible to deploy Flink job as a separate docker image?