2
votes

I understand the need to specify encoding when converting a byte[] to String in Java using appropriate format i.e. hex, base64 etc. because the default encoding may not be same in different platforms. But I am not sure I understand the same while converting a string to bytes. So this question, is to wrap my head around the concept of need to specify character set while transferring Strings over web.

Consider foll. code in Java

Note: The String in example below is not read from a file, another resource, it is created.

1: String message = "a good message";

2: byte[] encryptedMsgBytes = encrypt(key,,message.getBytes());

3: String base64EncodedMessage = new String (Base64.encodeBase64(encryptedMsgBytes));

I need to send this over the web using Http Post & will be received & processed (decrypted, converted from base64 etc.) at other end.

Based on reading up article, the recommended practice is to use .getBytes("utf-8") on line 2, i.e message.getBytes("UTF-8")

& the similar approach is recommended on other end to process the data as shown on line 7 below

4: String base64EncodedMsg =

5: byte[] base64EncodedMsgBytes = Base64.encodeBase64(base64EncodedMsg));

6: byte[] decryptedMsgBytes = decrypt(aesKey, "AES", Base64.decodeBase64(base64EncodedMessage);

7: String originalMsg = new String(decryptedMsgBytes, "UTF-8");

Given that Java's internal in-memory string representation is utf-16. ( excluding: UTF8 during serialization & file saving) , do we really need this if the decryption was also done in Java (Note: This is not a practical assumption, just for sake of discussion to understand the need to mention encoding)? Since, in JVM the String 'message' on line 1 was represented using UTF-16, wouldn't the .getBytes() method without specifying the encoding always return the UTF-16 bytes ? or is that incorrect and .getBytes() method without specifying the encoding always returns raw bytes ? Since the internal representation is UTF-16 why would the default character encoding on a particular JVM matter ?

If indeed it returns UTF-16, then is there is need to use new String(decryptedMsgBytes, "UTF-8") on other end ?

1

1 Answers

1
votes

wouldn't the .getBytes() method without specifying the encoding always return the UTF-16 bytes ?

This is incorrect. Per the Javadoc, this uses the platform's default charset:

Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.