I'm using Amazon EMR and I'm able to run most jobs fine. I'm running into a problem when I start loading and generating more data within the EMR cluster. The cluster runs out of storage space.
Each data node is a c1.medium instance. According to the links here and here each data node should come with 350GB of instance storage. Through the ElasticMapReduce Slave security group I've been able to verify in my AWS Console that the c1.medium data nodes are running and are instance stores.
When I run hadoop dfsadmin -report on the namenode, each data node has about ~10GB of storage. This is further verified by running df -h
hadoop@domU-xx-xx-xx-xx-xx:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.9G 2.6G 6.8G 28% /
tmpfs 859M 0 859M 0% /lib/init/rw
udev 10M 52K 10M 1% /dev
tmpfs 859M 4.0K 859M 1% /dev/shm
How can I configure my data nodes to launch with the full 350GB storage? Is there a way to do this using a bootstrap action?