hadoop cluster: hadoop streaming map task run only on one master machine not in slaves

Question

I have a hadoop cluster of three machines :

One master (ResourceManager, NameNode, SecondaryNameNode)
And two slaves (DataNode, NodeManager)

I run a c ++ program with hadoop streaming which :

Accepts in input a text file that contains the names of the videos, stored under HDFS

input.txt:
```
video0001.avi

Video0002.avi
```
...
After reading each line (as a key) by the mapper, it must copy the video whose name is input from hdfs and store it on the slave machine, then the program runs opencv and ffmpeg on the video and then switches to video 2 To do the same thing
mapper return name of video as key and Some parameters of video as value
I have the programs on all cluster machines
the cluster setting is good, I can copy the files to you
When I run the program on single node it works well, but When I run the program on single node it works fine but when I run it on a cluster of three machines it only works on master without using the slaves
i run this command on master machine :

hadoop jar /usr/local/lib/hadoop-2.7.3/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar -input /user/root/input -output /user/root/output -mapper signature -file signature

•   12/20 02:43:51 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
•   16/12/20 02:43:51 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
•   16/12/20 02:43:51 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
•   16/12/20 02:43:52 INFO mapred.FileInputFormat: Total input paths to process : 1
•   16/12/20 02:43:52 INFO mapreduce.JobSubmitter: number of splits:1
•   16/12/20 02:43:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local815523916_0001
•   16/12/20 02:43:54 INFO mapred.LocalDistributedCacheManager: Localized file:/home/master/Desktop/Extract_signature/Prog/signature as file:/app/hadoop/tmp/mapred/local/1482230633565/signature
•   16/12/20 02:43:54 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
•   16/12/20 02:43:54 INFO mapreduce.Job: Running job: job_local815523916_0001
•   16/12/20 02:43:54 INFO mapred.LocalJobRunner: OutputCommitter set in config null
•   16/12/20 02:43:54 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
•   16/12/20 02:43:54 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
•   16/12/20 02:43:55 INFO mapred.LocalJobRunner: Waiting for map tasks
•   16/12/20 02:43:55 INFO mapred.LocalJobRunner: Starting task: attempt_local815523916_0001_m_000000_0
•   16/12/20 02:43:55 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
•   16/12/20 02:43:55 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
•   16/12/20 02:43:55 INFO mapred.MapTask: Processing split: hdfs://Hadoop:54310/user/root/input/input.txt:0+33
•   16/12/20 02:43:55 INFO mapred.MapTask: numReduceTasks: 1
•   16/12/20 02:43:55 INFO mapreduce.Job: Job job_local815523916_0001 running in uber mode : false
•   16/12/20 02:43:55 INFO mapreduce.Job:  map 0% reduce 0%
•   16/12/20 02:44:48 INFO mapred.LocalJobRunner: hdfs://Hadoop:54310/user/root/input/input.txt:0+33 > map
•   16/12/20 02:44:48 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
•   16/12/20 02:44:48 INFO streaming.PipeMapRed: PipeMapRed exec [/home/master/Desktop/Extract_signature/Prog/./signature]
•   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
•   16/12/20 02:44:48 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
•   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
•   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
•   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
•   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
•   16/12/20 02:44:48 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
•   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
•   16/12/20 02:44:48 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
•   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
•   16/12/20 02:44:48 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
•   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
•   16/12/20 02:44:49 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:1=1/1 [rec/s] out:0=0/1 [rec/s]

•   16/12/20 02:44:54 INFO mapred.LocalJobRunner: hdfs://Hadoop:54310/user/root/input/input.txt:0+33 > map
•   16/12/20 02:44:54 INFO mapreduce.Job:  map 67% reduce 0%

•   There were 11 warnings (use warnings() to see them)
•   16/12/20 02:47:48 INFO streaming.PipeMapRed: Records R/W=2/2
•   16/12/20 02:47:48 INFO streaming.PipeMapRed: MRErrorThread done
•   16/12/20 02:47:48 INFO streaming.PipeMapRed: mapRedFinished
•   16/12/20 02:47:48 INFO mapred.LocalJobRunner: Records R/W=2/1 > map
•   16/12/20 02:47:48 INFO mapred.MapTask: Starting flush of map output
•   16/12/20 02:47:48 INFO mapred.MapTask: Spilling map output
•   16/12/20 02:47:48 INFO mapred.MapTask: bufstart = 0; bufend = 40; bufvoid = 104857600
•   16/12/20 02:47:48 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/6553600
•   16/12/20 02:47:48 INFO mapred.MapTask: Finished spill 0
•   16/12/20 02:47:48 INFO mapred.Task: Task:attempt_local1256877917_0001_m_000000_0 is done. And is in the process of committing
•   16/12/20 02:47:48 INFO mapred.LocalJobRunner: Records R/W=2/2
•   16/12/20 02:47:48 INFO mapred.Task: Task 'attempt_local1256877917_0001_m_000000_0' done.
•   16/12/20 02:47:48 INFO mapred.LocalJobRunner: Finishing task: attempt_local1256877917_0001_m_000000_0
•   16/12/20 02:47:48 INFO mapred.LocalJobRunner: map task executor complete.
•   16/12/20 02:47:48 INFO mapred.LocalJobRunner: Waiting for reduce tasks
•   16/12/20 02:47:48 INFO mapred.LocalJobRunner: Starting task: attempt_local1256877917_0001_r_000000_0
•   16/12/20 02:47:48 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
•   16/12/20 02:47:48 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
•   16/12/20 02:47:49 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@71589312
•   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
•   16/12/20 02:47:49 INFO reduce.EventFetcher: attempt_local1256877917_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
•   16/12/20 02:47:49 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1256877917_0001_m_000000_0 decomp: 46 len: 50 to MEMORY
•   16/12/20 02:47:49 INFO reduce.InMemoryMapOutput: Read 46 bytes from map-output for attempt_local1256877917_0001_m_000000_0
•   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 46, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->46
•   16/12/20 02:47:49 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
•   16/12/20 02:47:49 INFO mapred.LocalJobRunner: 1 / 1 copied.
•   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
•   16/12/20 02:47:49 INFO mapred.Merger: Merging 1 sorted segments
•   16/12/20 02:47:49 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 25 bytes
•   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: Merged 1 segments, 46 bytes to disk to satisfy reduce memory limit
•   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: Merging 1 files, 50 bytes from disk
•   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
•   16/12/20 02:47:49 INFO mapred.Merger: Merging 1 sorted segments
•   16/12/20 02:47:49 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 25 bytes
•   16/12/20 02:47:49 INFO mapred.LocalJobRunner: 1 / 1 copied.
•   16/12/20 02:47:49 INFO mapred.Task: Task:attempt_local1256877917_0001_r_000000_0 is done. And is in the process of committing
•   16/12/20 02:47:49 INFO mapred.LocalJobRunner: 1 / 1 copied.
•   16/12/20 02:47:49 INFO mapred.Task: Task attempt_local1256877917_0001_r_000000_0 is allowed to commit now
•   16/12/20 02:47:49 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1256877917_0001_r_000000_0' to hdfs://Hadoop:54310/user/root/output/_temporary/0/task_local1256877917_0001_r_000000

•   16/12/20 02:47:49 INFO mapred.Task: Task 'attempt_local1256877917_0001_r_000000_0' done.
•   16/12/20 02:47:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local1256877917_0001_r_000000_0
•   16/12/20 02:47:49 INFO mapred.LocalJobRunner: reduce task executor complete.
•   16/12/20 02:47:49 INFO mapreduce.Job:  map 100% reduce 100%
•   16/12/20 02:47:49 INFO mapreduce.Job: Job job_local1256877917_0001 completed successfully
•   16/12/20 02:47:50 INFO mapreduce.Job: Counters: 35

•   16/12/20 02:47:50 INFO streaming.StreamJob: Output directory: /user/root/output

AdamSkywalker AdamSkywalker · Accepted Answer · 2016-12-22T17:23:16

According to your logs

•   16/12/20 02:43:52 INFO mapred.FileInputFormat: Total input paths to process : 1
•   16/12/20 02:43:52 INFO mapreduce.JobSubmitter: number of splits:1

hadoop used the whole file as a single split.

Try using NLineInputFormat to split the input across mappers on several machines

hadoop cluster: hadoop streaming map task run only on one master machine not in slaves

1 Answers