I have taken cloud computing and created a project in Mapreduce for weather analysis. For this purpose, I installed the below software in my laptop.
- Oracle Virtualbox
- HortonWorks sandbox
I selected Redhat Linux and allocated 2 GB as the main memory. I used the hadoop image from sandbox site and loaded it using virtualbox. If my understanding is correct, the 2 GB is allocated from my system and the MapReduce job is done in my local machine itself. Am I correct in this point?
I created my MapReduce program and ran it in sandbox. It worked fine and I got the desired output.
- For my job, the job tracker showed that 8 mappers were used and for reduce one reducer was used. So the 8 mappers were basically 8 splits of my 2 GB main memory which were used as mappers for processing the data.
If the above statement is correct, why do I see only one reducer being used?
If the mappers are from sandbox, do they have servers like Amazon's EMR?