1
votes

so I am getting the following exception when I want to process a file which is bigger than my hdfs block size (64mb):

    2013-05-31 01:49:46,252 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Can't seek!
    at org.apache.hadoop.hdfs.HftpFileSystem$3.seek(HftpFileSystem.java:359)
    at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:76)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

I am running the job with only one path (one file) as an input. The interesting thing is that I actually tried to split the file in two smaller parts, each of which smaller than the block size and it worked this way.. Than I concatenated the files and tried with the concatenated file and it didn't work again.

I guess I have a configuration issue, but I don't know what. I am using HBase on top of Hadoop and HBase doesn't seem to be having any problems.

I will appreciate any ideas/thoughts about this. Thanks in advance!

1

1 Answers

2
votes

As stated in HDFS-2457 and HDFS-2396, hftp scheme doesn't support seek operation, so this error is expected. Hftp is essentially a protocol for accessing filesystems over HTTP, I am not sure why you are using it but you should switch to using hdfs instead which definitely supports seek and you shouldn't have this error anymore.