I read the documentation on AWS, but a point is still unclear.
Is S3 the primary storage of EMR cluster? or does the data are in EC2 and S3 is just a copy?
In the doc :
"HBase on Amazon EMR provides the ability to back up your HBase data directly to Amazon Simple Storage Service (Amazon S3)"
"Hadoop clusters running on Amazon EMR use EC2 instances as virtual Linux servers for the master and slave nodes, Amazon S3 for bulk storage of input..."
"provides the ability to launch a new cluster and populate it with data from a previous HBase backup"
My use case : Use HBASE to store TB of data. Update my tables only three or two times a month by starting an emr cluster. Tables store on S3.