I have recently been unable to upload any jars to my Flink cluster, running under YARN on AWS EMR. There has been a long-running streaming application running for 26 days. It seems like the temp directory has been deleted, but I really don't think that I deleted it.
From the jobmanager.log
:
2019-02-12 22:02:05,156 WARN org.apache.flink.runtime.webmonitor.handlers.JarListHandler - Jar upload dir /tmp/flink-web-94fee1e8-35b9-409f-be97-d86c0f021459/flink-web-upload does not exist, or had been deleted externally. Previously uploaded jars are no longer available.
The instance has plenty of space for storing the jar.
Here is the YARN app status:
Application-Id : application_1547758510009_0001
Application-Name : Flink session cluster
Application-Type : Apache Flink
User : hadoop
Queue : default
Application Priority : 0
Start-Time : 1547758629234
Finish-Time : 0
Progress : 100%
State : RUNNING
Final-State : UNDEFINED
Tracking-URL : http://ip-cp1.ec2.internal:39975
RPC Port : 39975
AM Host : ip-cp1.ec2.internal
Aggregate Resource Allocation : 43765538005 MB-seconds, 4500338 vcore-seconds
Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds
Log Aggregation Status : NOT_START
Diagnostics :
Unmanaged Application : false
Application Node Label Expression : <Not set>
AM container Node Label Expression : <DEFAULT_PARTITION>
I have not set either jobmanager.web.upload.dir
nor jobmanager.web.tmpdir
. After recreating that directory, I am able to upload into it through cURL (and verify that the file arrives) but then subsequently listing the jars shows nothing.
Has anyone seen this before? And also, how can I now correctly recreate the necessary upload directory?