Refer to Hadoop Definitive Guide text book ... Apart from that there is lot of info in apache web site.
Container size is not fixed it is dynamically allocated based on requirement by Resource Manager.
From developer perspective same old map-reduce will work on YARN.
ResourceManager failures
In the initial versions of the YARN framework, ResourceManager failures meant a total cluster failure, as it was a single point of failure. The ResourceManager stores the state of
the cluster, such as the metadata of the submitted application, information on cluster
resource containers, information on the cluster’s general configurations, and so on.
Therefore, if the ResourceManager goes down because of some hardware failure, then
there is no way to avoid manually debugging the cluster and restarting the
ResourceManager. During the time the ResourceManager is down, the cluster is
unavailable, and once it gets restarted, all jobs would need a restart, so the half-completed jobs lose any data and need to be restarted again. In short, a restart of the ResourceManager used to restart all the running ApplicationMasters. The latest versions of YARN address this problem in two ways. One way is by creating an active-passive ResourceManager architecture, so that when one goes down, another becomes active and takes responsibility for the cluster. Another way is by using the Zookeeper ResourceManager quorum, so that the ResourceManager state is stored externally over the Zookeeper, and one
ResourceManager is in an active state and one or more ResourceManagers are in passive mode, waiting for something to happen that brings them to an active state.
ApplicationMaster failures
When the ApplicationMaster fails, the ResourceManager simply starts another container with a new ApplicationMaster running in it for another application attempt. It is the responsibility of the new ApplicationMaster
to recover the state of the older ApplicationMaster, and this is possible only when ApplicationMasters persist their states in the external location so that it can be used for future reference. ApplicatoinMaster will store their state to persisitant disk thus all the status till the failure can be recovered.
NodeManager Failures
If a Node Manager fails, the ResourceManager detects this failure using a time-out (that is, stops receiving the heartbeats from the NodeManager). The ResourceManager then removes the NodeManager from its pool of available NodeManagers. It also kills all the containers running on that node & reports the failure to all running AMs. AMs are then responsible for reacting to node failures, by redoing the work done by any containers running on that node during the fault.
Container Failures
Container failures will be reported by node manager to Resource manager and Resource manager informs the same to Application Master. Now Application will restart the container.