6
votes

For a Java class final project, we need to setup Hadoop and implement an n-gram processor. I have found a number of 'Hadoop on AWS' tutorials, but am uncertain how to deploy Hadoop while staying in the free tier. I tried a while ago, and received a bill for over $250 USD. Now I am too nervous to test, and would like help to ensure I do not rack up another bill.

From what I understand, these are the limits of the free tier: Allowable Instances: t1.micro or t2.micro Maximum instance-hours per month: 750 OS: Standard-priced versions of Linux or Windows on EC2 Storage: 30 GB of EBS Many of the Hadoop tutorials use instances other than those two shown above. The AWS tutorial here mentions it will cost about $1, stating it will run for one hour. I need to keep this/these instances active for over 2 weeks, but only really use them for a few minutes at a time. I do not believe that I will exceed even one-tenth of that 750 hours.

We can get bonus points if we use "more than 1 machine". Can I do that within the free tier? Does the free tier have a limit on the number of instances it can spin up?

Does anyone have a tutorial that stays in the free tier? Or should I skip AWS and try a local Hadoop solution?

1

1 Answers

8
votes

If you want to limit your hadoop cluster nodes only to t2.micro instances and total EBS volumes size to 30 GB, then you can run [in theory] a hadoop cluster within free tier. Do note that the hardware on t2.micro are of meagre.

The thing about free tier on AWS is that you are allowed only t2.micro for 750 hours per month. That means you can run for example 10 nodes for 75 hrs in a month for free after which you would be billed.

Here is a post that does exactly what you want with 4 nodes "Spinning Up a Free Hadoop Cluster: Step by Step". So you should be able to run this cluster within free limit for around 1 week