0
votes

I have taken cloud computing and created a project in Mapreduce for weather analysis. For this purpose, I installed the below software in my laptop.

  • Oracle Virtualbox
  • HortonWorks sandbox

I selected Redhat Linux and allocated 2 GB as the main memory. I used the hadoop image from sandbox site and loaded it using virtualbox. If my understanding is correct, the 2 GB is allocated from my system and the MapReduce job is done in my local machine itself. Am I correct in this point?

I created my MapReduce program and ran it in sandbox. It worked fine and I got the desired output.

  • For my job, the job tracker showed that 8 mappers were used and for reduce one reducer was used. So the 8 mappers were basically 8 splits of my 2 GB main memory which were used as mappers for processing the data.

If the above statement is correct, why do I see only one reducer being used?

If the mappers are from sandbox, do they have servers like Amazon's EMR?

2
I recommend you read "HBase: The Definitive Guide". It will answer your questions.zsxwing
@zsxwing : He's nowhere talking about HBase. Do you mean Hadoop Definitive Guide?Tariq
Sorry for my mistake. yes, i mean "Hadoop: the Definitive Guide". For hadoop newbies, I highly recommend this book.zsxwing

2 Answers

0
votes

I'm sorry but I didn't quite get what exactly you are trying to ask. You question looks like a post's title more than a question.

Mappers and Reducers are something which are the components of MapReduce framework. Hortonworks is just one of the vendors who provide custom Hadoop distributions. There are certain differences among these different flavors of Hadoop, but the criteria of Mapper and Reducer creation is same everywhere.

If my understanding is correct, the 2 GB is allocated from my system and the MapReduce job is done in my local machine itself. Am I correct in this point?

First of all recommended memory is 4GB.

Yes, it'll be allocated from the machine where virtualbox is running. Where else would you get resources from when virtualbox is running on "this" machine? And what does allocated memory have to do with where MR jobs are running?When you are using the sandbox, they'll run in the sandbox.

For my job, the job tracker showed that 8 mappers were used and for reduce one reducer was used. So the 8 mappers were basically 8 splits of my 2 GB main memory which were used as mappers for processing the data.

8 mappers were 8 instances of your Mapper code which were processing 8 splits of your input data and NOT 8 splits of memory, or anything else.

If the mappers are from sandbox, do they have servers like Amazon's EMR?

This is a total bouncer for me. Please rephrase it.

0
votes

Identity mapper is used when you want to read the data and process as it is without any change.