I am new to Hadoop and Hive world.
I have written a Hive query which is processing 189 Million rows (40 GB file). While I am executing query. Hive query is executing in single machine and generating many map and reduce tasks. Is that expected behavior?
I have read in many articles Hadoop is distributed processing framework. What I was understanding Hadoop will split your job in multiple tasks and distribute those tasks in different nodes and once tasks finish reducer will join the output. Please correct me if I am wrong.
I have 1 master and 2 slave nodes. I am using Hadoop 2.2.0 and Hive 0.12.0.