I was trying to understand character encoding in Java. Characters in Java are being stored in 16 bits using UTF-16 encoding. So while i am converting a string containing 6 character to byte i am getting 6 bytes as below, I am expecting it to be 12. Is there any concept i am missing ?
package learn.java;
public class CharacterTest {
public static void main(String[] args) {
String str = "Hadoop";
byte bt[] = str.getBytes();
System.out.println("the length of character array is " + bt.length);
}
}
O/p :the length of character array is 6
As per @Darshan When trying with UTF-16 encoding to get bytes the result is also not expecting .
package learn.java;
public class CharacterTest {
public static void main(String[] args) {
String str = "Hadoop";
try{
byte bt[] = str.getBytes("UTF-16");
System.out.println("the length of character array is " + bt.length);
}
catch(Exception e)
{
}
}
}
o/p: the length of character array is 14
str.getBytes("UTF-16");
but i am wonder o/p is 14 – Darshan Patelutf-16le
orutf-16be
please refere following [link] rosettacode.org/wiki/String_length for more details. – Darshan Patel0x76
0x77
, indicating that the following bytes are using the (default) Big Endian notation, instead of the (alternate) Little Endian notation. This kind of prefix is called a Byte Order Marker (BOM). Without the BOM, there will be 12 bytes, two per char. – tucuxi