2
votes

We are processing a byte[] as shown below (the file is POST'ed to a web server, this code is running in Glassfish) and have found that some files have a byte-order mark (BOM, a three-byte sequence 0xEF,0xBB,0xBF, see: http://en.wikipedia.org/wiki/Byte_order_mark) at the beginning, and we want to remove this BOM. How would we detect and remove a BOM in this code? Thanks.

  private final void serializePayloadToFile(File file, byte[] payload) throws IOException {

    FileOutputStream fos;
    DataOutputStream dos;

    fos = new FileOutputStream(file, true); // true for append
    dos = new DataOutputStream(fos);

    dos.write(payload);
    dos.flush();
    dos.close();
    fos.close();

    return;
  }  
3

3 Answers

2
votes

How would we detect [...]

There's obviously no way to tell for sure if the three bytes are three random bytes or three bytes representing a BOM.

You could check if the array starts with 0xEF, 0xBB, 0xBF and in that case skip them.

[...] and remove a BOM in this code?

Something like this should do:

int off = payload.length >= 3
       && payload[0] == 0xEF
       && payload[1] == 0xBB
       && payload[2] == 0xBF ? 3 : 0

dos.write(payload, off, payload.length - off);
1
votes

DataOutputStream has a write() method with offsets and length

public void write(byte[] b, int off, int len);

So test for the byte order mark and set off (and len) appropriately.

0
votes

The simplest solution seems to be adding another OutputStream implementation between dos and fos and buffering the first few bytes there, before actually committing them to fos. You might or might not want to throw them away, depending on their values.