0
votes

I'm trying to upgrade my project from Flink 1.4 to Flink 1.9. On 1.4 I was building a fat jar which included all of my hadoop 2.9.2 dependencies which I then used to submit to the Flink cluster on k8s. I did not setup hadoop on the cluster.

When I upgraded the project to 1.9 and upgraded the cluster as well I'm unable to run the code on the cluster although it runs just fine on my IntelliJ IDE. The exception is:

java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
...

Since I'm not including any hadoop dependency in flink, I was under the assumption that it should still work as all the dependencies are packaged up into the fat jar because it works just fine on 1.4.

I've tried adding in the depency to shaded-hadoop2 which doesn't fix the issue

    compile group: 'org.apache.flink', name: 'flink-shaded-hadoop2-uber', version: '2.4.1-1.8.2'

I'm guessing that setting the hadoop path for flink might fix it and have been struggling with trying to understand how exactly I should do that in my Dockerfile. Do I need to untar the hadoop 2 binary or create some jars and add them to /flink/lib?

My Dockerfile looks like this at the moment:

FROM openjdk:8-jre
MAINTAINER User "[email protected]"
LABEL version="v1.9.0"

ENV FLINK_HOME=/flink
ENV FLINK_CONF_DIR=/flink/conf
ENV FLINK_APPS_DIR=/flink/apps
ENV FLINK_LIB_DIR=/flink/lib

RUN mkdir -p ${FLINK_HOME}
RUN mkdir -p ${FLINK_CONF_DIR}
RUN mkdir -p ${FLINK_APPS_DIR}
RUN mkdir -p ${FLINK_LIB_DIR}

ENV PATH=$FLINK_HOME/bin:$PATH
ENV CLASSPATH=.:$FLINK_APPS_DIR:$FLINK_LIB_DIR

COPY dist/flink-1.9.0-bin-scala_2.11.tgz ${FLINK_HOME}/flink.tgz
WORKDIR ${FLINK_HOME}

COPY prepare-deployment.sh /
RUN chmod +x /prepare-deployment.sh
RUN /prepare-deployment.sh
RUN rm -rf /prepare-deployment.sh


COPY Tools/netstat /bin/netstat
COPY Tools/ttyd-static-amd64 /bin/ttyd
COPY Tools/jq /bin/jq
COPY Tools/checktm /bin/checktm
COPY Tools/checktm_log /bin/checktm_log

COPY docker-entrypoint.sh /
RUN chmod +x /docker-entrypoint.sh
RUN chmod -R 755 /bin
RUN chmod -R 777 /flink
RUN chmod -R 777 /etc
EXPOSE 6122 6123 6124 6125 6126 6127 8080 8081
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["--help"]
1

1 Answers

0
votes

In the Flink sources you will find a flink-container directory that contains a build.sh script for building a Docker image, and a Dockerfile, etc. They are setup to help you get these details right, and are parameterized for including hadoop libraries according to your needs.