6
votes

I have a pretty interesting topic - at least for me. Given a ByteArrayOutputStream with bytes for example in UTF-8, I need a function that can "translate" those bytes into another - new - ByteArrayOutputStream in for example UTF-16, or ASCII or you name it. My naive approach would have been to use a an InputStreamReader and give in the the desired encoding, but that didn't work because that'll read into a char[] and I can only write byte[] to the new BAOS.

public byte[] convertStream(Charset encoding) {
    ByteArrayInputStream original = new ByteArrayInputStream(raw.toByteArray());
    InputStreamReader contentReader = new InputStreamReader(original, encoding);
    ByteArrayOutputStream converted = new ByteArrayOutputStream();

    int readCount;
    char[] buffer = new char[4096];
    while ((readCount = contentReader.read(buffer, 0, buffer.length)) != -1)
        converted.write(buffer, 0, readCount);

    return converted.toByteArray();
}

Now, this obviously doesn't work and I'm looking for a way to make this scenario possible, without building a String out of the byte[].

@Edit: Since it seems rather hard to read the obvious things. 1) raw: ByteArrayOutputStream containing bytes of a BINARY object sent to us from clients. The bytes usually come in UTF-8 as a part of a HTTP Message. 2) The goal here is to send this BINARY data forward to an internal System that's not flexible - well this is an internal System - and it accepts such attachments in UTF-16. I don't know why don't even ask, it does so.

So to justify my question: Is there a way to convert a byte array from Charset A to Charset B or encoding of your choise. Once again Building a String is NOT what I'm after.

Thank you and hope that clears up questionable parts :).

1
What is raw? You've only given us part of the information. I'd expect to just convert the bytes to a string, and then convert back from a string to a byte array. No need to use streams at all.Jon Skeet
Well, raw is obviously a ByteArrayOutputStream containing the bytes in whatever encoding that was used by our client of a binary data. We have to transfer this data to our System in utf-8 formát so we need to convert the whatever to utf-8 or whatever. I hope that clears it up. Building a string is out of question right now.Display name
Why is building a string out of the question? If the most obvious approach is inappropriate, you need to explain why that's the case. And the benefit of a short but complete example is that what you consider "obvious" is spelled out in the code. Far too often I've made assumptions that seem "obvious" to me, but turn out not to be... and when you're now adding restrictions as to what is feasible and what isn't, that adds to the confusion.Jon Skeet
But the answer building a string up does answer your original question. There was nothing in that original question to explain why you wouldn't want to do that. You still haven't said why you refuse to create a string. And being rude to people trying to help you is a really, really bad idea.Jon Skeet

1 Answers

14
votes

As mentioned in comments, I'd just convert to a string:

String text = new String(raw.toByteArray(), encoding);
byte[] utf8 = text.getBytes(StandardCharsets.UTF_8);

However, if that's not feasible (for some unspecified reason...) what you've got now is nearly there - you just need to add an OutputStreamWriter into the mix:

// Nothing here should throw IOException in reality - work out what you want to do.
public byte[] convertStream(Charset encoding) throws IOException {       
    ByteArrayInputStream original = new ByteArrayInputStream(raw.toByteArray());
    InputStreamReader contentReader = new InputStreamReader(original, encoding);

    int readCount;
    char[] buffer = new char[4096];
    try (ByteArrayOutputStream converted = new ByteArrayOutputStream()) {
        try (Writer writer = new OutputStreamWriter(converted, StandardCharsets.UTF_8)) {
            while ((readCount = contentReader.read(buffer, 0, buffer.length)) != -1) {
                writer.write(buffer, 0, readCount);
            }
        }
        return converted.toByteArray();
    }
}

Note that you're still creating an extra temporary copy of the data in memory, admittedly in UTF-8 rather than UTF-16... but fundamentally this is hardly any more efficient than creating a string.

If memory efficiency is a particular concern, you could perform multiple passes in order to work out how many bytes will be required, create a byte array of the write length, and then adjust the code to write straight into that byte array.