I'm running Flink(1.4.2) on Yarn. I'm using Flink Yarn Client for submitting the job to Yarn Cluster.
Suppose I have a TM with 4 slots and I deploy a flink job with parallelism=4 with 2 container - 1 JM and 1 TM. Each parallel instance will be deployed in one task slot each in the TM (the entire job pipeline running per slot).
My jobs do a join(SQL time-windowed join on non-keyed stream) and they buffer last 3 hours of data. As per Flink docs the separate threads running in different task slot share data sets and data structures, thus reducing the per-task overhead.
My question is will these threads running in different task slot share this data buffered for join. What all data is shared across these threads.
Edit
Sample Query -
SELECT R.order_id, S.order.restaurant_id FROM awz_s3_stream1 R INNER JOIN awz_s3_stream2 S ON CAST(R.order_id AS VARCHAR) = S.order_id AND R.proctime BETWEEN S.proctime - INTERVAL '2' HOUR AND S.proctime + INTERVAL '2' HOUR GROUP BY HOP(S.proctime, INTERVAL '2' MINUTE, INTERVAL '1' HOUR), S.
order.restaurant_id