I went through a few posts, like FileReader reads the file as a character stream and can be treated as whitespace if the document is handed as a stream of characters where the answers say the input source is actually a char stream, not a byte stream.
However, the suggested solution from 1 does not seem to apply to UTF-16LE. Although I use this code:
try (final InputStream is = Files.newInputStream(filename.toPath(), StandardOpenOption.READ)) {
DOMParser parser = new org.apache.xerces.parsers.DOMParser();
parser.parse(new InputSource(is));
return parser.getDocument();
} catch (final SAXParseException saxEx) {
LOG.debug("Unable to open [{}}] as InputSource.", absolutePath, saxEx);
}
I still get org.xml.sax.SAXParseException: Content is not allowed in prolog.
.
I looked at Files.newInputStream, and it indeed uses a ChannelInputStream
which will hand over bytes, not chars. I also tried to set the Encoding of the InputSource object, but with no luck.
I also checked that there are not extra chars (except the BOM) before the <?xml
part.
I also want to mention that this code works just fine with UTF-8.
// Edit: I also tried DocumentBuilderFactory.newInstance().newDocumentBuilder().parse() and XmlInputStreamReader.next(), same results.
// Edit 2: Tried using a buffered reader. Same results: Unexpected character '뿯' (code 49135 / 0xbfef) in prolog; expected '<'
Thanks in advance.
... { is.read(): is.read();
– Joop Eggen<?xml encoding=...?>
. I have heard that in rare cases a BOM gave such a problem. But I do not remember specifics. – Joop Eggen