Getting can't seek java exception when file is bigger than HDFS block size

Question

so I am getting the following exception when I want to process a file which is bigger than my hdfs block size (64mb):

    2013-05-31 01:49:46,252 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Can't seek!
    at org.apache.hadoop.hdfs.HftpFileSystem$3.seek(HftpFileSystem.java:359)
    at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
    at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:76)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

I am running the job with only one path (one file) as an input. The interesting thing is that I actually tried to split the file in two smaller parts, each of which smaller than the block size and it worked this way.. Than I concatenated the files and tried with the concatenated file and it didn't work again.

I guess I have a configuration issue, but I don't know what. I am using HBase on top of Hadoop and HBase doesn't seem to be having any problems.

I will appreciate any ideas/thoughts about this. Thanks in advance!

Charles Menguy Charles Menguy · Accepted Answer · 2013-05-31T00:22:06

As stated in HDFS-2457 and HDFS-2396, hftp scheme doesn't support seek operation, so this error is expected. Hftp is essentially a protocol for accessing filesystems over HTTP, I am not sure why you are using it but you should switch to using hdfs instead which definitely supports seek and you shouldn't have this error anymore.

Getting can't seek java exception when file is bigger than HDFS block size

1 Answers