I'm running an example from Apache Mahout 0.9 (org.apache.mahout.classifier.df.mapreduce.BuildForest) using the PartialBuilder implementation on Hadoop, but I'm getting an error no matter what I try.
The error is:
14/12/10 10:58:36 INFO mapred.JobClient: Running job: job_201412091528_0004
14/12/10 10:58:37 INFO mapred.JobClient: map 0% reduce 0%
14/12/10 10:58:50 INFO mapred.JobClient: map 10% reduce 0%
14/12/10 10:59:44 INFO mapred.JobClient: map 20% reduce 0%
14/12/10 11:32:23 INFO mapred.JobClient: Task Id : attempt_201412091528_0004_m_000000_0, Status : FAILED
java.io.IOException: All datanodes 127.0.0.1:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3290)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2783)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2987)
There are no glaring errors in the datanode log file:
2014-12-10 11:32:19,157 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50010, dest: /127.0.0.1:62662, bytes: 549024, op: HDFS_READ, cliID: DFSClient_attempt_201412091528_0004_m_000000_0_1249767243_1, offset: 0, srvID: DS-957555695-10.0.1.9-50010-1418164764736, blockid: blk_-7548306051547444322_12804, duration: 2012297405000
2014-12-10 11:32:25,511 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50010, dest: /127.0.0.1:64109, bytes: 1329, op: HDFS_READ, cliID: DFSClient_attempt_201412091528_0004_m_000000_1_-1362169490_1, offset: 0, srvID: DS-957555695-10.0.1.9-50010-1418164764736, blockid: blk_4126779969492093101_12817, duration: 285000
2014-12-10 11:32:29,496 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:50010, dest: /127.0.0.1:64110, bytes: 67633152, op: HDFS_READ, cliID: DFSClient_attempt_201412091528_0004_m_000000_1_-1362169490_1, offset: 0, srvID: DS-9575556
... or in the namenode file. The jobtracker just repeats the error found in the datanode log. The one error that precedes the failure by several minutes is an EOF error, that may or may not be a problem for PartialBuilder:
2014-12-10 12:12:22,060 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-957555695-10.0.1.9-50010-1418164764736, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:296)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:340)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:404)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:582)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:404)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
at java.lang.Thread.run(Thread.java:695)
I am able to read and write files to DFS directly. I can even run this job on a small subset of the data, but I can't get this Map/Reduce job to work correctly. Any idea what I'm doing wrong?
Notes about my setup:
- Everything is running locally
- Apache Mahout version 0.9
- Hadoop version 1.2.1
- Java version 1.6.0_65
hdfs-site.xml settings:
- dfs.replication = 4
- dfs.permissions = false
- dfs.name.dir = /Users/juliuss/Dev/hdfs/name/
- dfs.data.dir = /Users/juliuss/Dev/hdfs/data/
- mapred.child.java.opts = -Xmx4096m
- dfs.datanode.socket.write.timeout = 0