Hadoop/Hive cluster. only one node utilization

Question

I have a small hadoop/hive cluster (6 nodes in total). Using "hadoop dfsadmin -report" I see that are datanodes are working well and connected. Additionally when I add data in a hive table I can see that the data are being distributed all over the node. (Easy to check, as the disk space used increases).

I am trying to create some indexes on one table. From the jobtracker http interface, I see only one node available. I tried to run multiple queries ( I use mysql for the metadata) but they appear to run only on the node that hive is installed.

Basically My question is how to make the jobtracker to utilize the other nodes as well.

David Gruzman David Gruzman · Accepted Answer · 2012-09-17T06:52:26

From what you tell it looks that:
Datanodes are properly running on all nodes and able to communicate with namenode.
Task trackers are not running on all nodes except of one, or, are not able to communicate with the job tracker for some reason.
After checking that task trackers indeed running - read their logs to find out what is their problem to communicate with JobTracker.

Hadoop/Hive cluster. only one node utilization

1 Answers