Use case HBase on EMR

Question

I read the documentation on AWS, but a point is still unclear.

Is S3 the primary storage of EMR cluster? or does the data are in EC2 and S3 is just a copy?

In the doc :

"HBase on Amazon EMR provides the ability to back up your HBase data directly to Amazon Simple Storage Service (Amazon S3)"
"Hadoop clusters running on Amazon EMR use EC2 instances as virtual Linux servers for the master and slave nodes, Amazon S3 for bulk storage of input..."
"provides the ability to launch a new cluster and populate it with data from a previous HBase backup"

My use case : Use HBASE to store TB of data. Update my tables only three or two times a month by starting an emr cluster. Tables store on S3.

Sergei Rodionov Sergei Rodionov · Accepted Answer · 2017-08-13T19:55:52

As of EMR 5.2.0 you can run HBase 1.3.0 and higher directly on AWS S3.

The setting replaces the hfds:// protocol in the hbase-site.xml file:

"hbase.rootdir": "s3://my-bucket/hbase"

No changes to HBase clients are required. The configuration simplifies operations by eliminating the need to manage HDFS NameNode and DataNodes.