I am a newbie to Hadoop and Hive. I am using Hive integration with Hadoop to execute the queries. When I submit any query, following log messages appear on console:
Hive history file=/tmp/root/hive_job_log_root_28058@hadoop2_201203062232_1076893031.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapred.reduce.tasks= Starting Job = job_201203062223_0004, Tracking URL = http://:50030/jobdetails.jsp?jobid=job_201203062223_0004 Kill Command = //opt/hadoop_installation/hadoop-0.20.2/bin/../bin/hadoop job -kill job_201203062223_0004 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2012-03-06 22:32:26,707 Stage-1 map = 0%, reduce = 0% 2012-03-06 22:32:29,716 Stage-1 map = 100%, reduce = 0% 2012-03-06 22:32:38,748 Stage-1 map = 100%, reduce = 100% Ended Job = job_201203062223_0004 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 8107686 HDFS Write: 4 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK
The text mentioned in bold starts a hadoop job (that's what I believe). It takes long time to start the job. Once this line gets executed, the map reduce operations execute swiftly. Following are my questions:
- Is there any way to make the launch of hadoop job faster. Is it possible to skip this phase?
- Where does the value of 'Kill command' come from (in the bold text)?
Please let me know if any inputs are required.