This probably is more like a "theoretical" or "good practices" question rather than strictly practical (no problem with codes, or cluster configuration files).
So, following this simple scenario:
- Submit via RESt (Apache Livy), say, 10 spark jobs to a YARN-SPARK cluster,
- due to resource management configurations, 5 of them are running and 5 accepted/pending,
would this result in 10 AM instances running concurrently in the Master node (consuming a lot of ram), right ?
If thats the case, is there any other approach ? Considering this:
- The job requests are fast,
- each time, the cluster would receive almost 1000 requests,
- each job takes an aprox. of 15 secs long to complete (sometimes less depending on the amount of data received to process in each call),
- limited ammount of resources (3 workers with 6gb and 4 cores each + master)