10
votes

The Sun Documentation for DataInput.skipBytes states that it "makes an attempt to skip over n bytes of data from the input stream, discarding the skipped bytes. However, it may skip over some smaller number of bytes, possibly zero. This may result from any of a number of conditions; reaching end of file before n bytes have been skipped is only one possibility."

  1. Other than reaching end of file, why might skipBytes() not skip the right number of bytes? (The DataInputStream I am using will either be wrapping a FileInputStream or a PipedInputStream.)

  2. If I definitely want to skip n bytes and throw an EOFException if this causes me to go to the end of the file, should I use readFully() and ignore the resulting byte array? Or is there a better way?

5

5 Answers

5
votes

1) There might not be that much data available to read (the other end of the pipe might not have sent that much data yet), and the implementing class might be non-blocking (i.e. it will just return what it can, rather than waiting for enough data to fulfil the request).

I don't know if any implementations actually behave in this way, however, but the interface is designed to permit it.

Another option is simply that the file gets closed part-way through the read.

2) Either readFully() (which will always wait for enough input or else fail) or call skipBytes() in a loop. I think the former is probably better, unless the array is truly vast.

3
votes

I came across this problem today. It was reading off a network connection on a virtual machine so I imagine there could be a number of reasons for this happening. I solved it by simply forcing the input stream to skip bytes until it had skipped the number of bytes I wanted it to:

int byteOffsetX = someNumber; //n bytes to skip
int nSkipped = 0;

nSkipped = in.skipBytes(byteOffsetX);
while (nSkipped < byteOffsetX) {
    nSkipped = nSkipped + in.skipBytes(byteOffsetX - nSkipped);
}
2
votes

It turns out that readFully() adds more performance overhead than I was willing to put up with.

In the end I compromised: I call skipBytes() once, and if that returns fewer than the right number of bytes, I call readFully() for the remaining bytes.

1
votes

Josh Bloch has publicised this recently. It is consistent in that InputStream.read is not guaranteed to read as many bytes as it could. However, it is utterly pointless as an API method. InputStream should probably also have readFully.

0
votes

According to the docs, readFully() is the only way that both works and guaranteed to work.

The actual Oracle implementation is... confusing:

public final int skipBytes(int n) throws IOException {
    int total = 0;
    int cur = 0;

    while ((total<n) && ((cur = (int) in.skip(n-total)) > 0)) {
        total += cur;
    }

    return total;
}

Why call skip() in a loop when skipBytes() has essentially the same contract as skip()? It would make perfect sense to implement it this way if both of the following were true: skipBytes() guaranteed to skip less than requested only on EOF, and skip() guaranteed to skip at least one byte if at all possible (just like read() does).

What's even worse, is that skip() is actually implemented using read(), which means it actually does the job. It just doesn't promise to do it, which means other implementations may fail to do so (and even Oracle one may potentially fail in the future if changed in future releases).

To be completely safe: call readBytes(), and if it doesn't do the job, allocate a temp buffer and call readFully() (use a loop here if the ability to skip over arbitrarily large amounts of data is needed).

On the other hand, calling skipBytes() in a loop is pointless. At least with Oracle implementation, because it's already a loop.