getBytes() doesn't work for Cyrillic letters

Question

I found some answers but none of them works for me. I want to make a pdf file from a html, but the problem is that my html has Cyrilic letters and I found that there's something to do with this simple code:

String s = "Здраво Kris";

byte bytes[] = s.getBytes("UTF-8");

String value = new String(bytes, "ISO-8859-1");

// I tried with new String(bytes, "UTF-8") but it didn't work

Then I pass the value to my pdf generator function but it outputs only the part from the string s that is not in Cyrilic, i.e. Kris

 htp.CreatePDF("<html><head><title>kristijan</title></head><body><h1>" + value + "</h1></body></html>", "kris");

Did you try with the String s? htp.CreatePDF("<html><head><title>kristijan</title></head><body><h1>" + s + "</h1></body></html>", "kris"); also CreatePDF looks like C# not Java. — Elliott Frisch
String objects in Java are always implicitly encoded in UTF-16. You can not change that encoding. — Abhishek

Bruno Lowagie Bruno Lowagie · Accepted Answer · 2014-12-29T08:12:32

Please take a look at my answer to this question: Can't get Czech characters while generating a PDF

Several things can go wrong in your code.

This is a very bad idea:

String s = "Здраво Kris";

Suppose that you send your .java file including this code to somebody who saves it as ASCII, then your source code will change into this:

String s = "Ð—Ð´Ñ€Ð°Ð²Ð¾ Kris";

I've also seen this happen when storing a document into a source control system.

Bottom line: never use special encodings when writing source code with hard-coded strings. Either store the strings in a file using the right encoding to write and read the string, or use the unicode notation if you insist on having hard-coded data in your source code.

Even if you store the file containing this string correctly, you have to be very careful when compiling the code. If the compiler uses a different encoding, s will be corrupted too.

You also have to make sure that you're reading the data correctly when converting the HTML to PDF. I assume that you are using XML Worker (and not the obsolete HTMLWorker class). There are different places where you can indicate which encoding to use.

Finally, you have to make sure that you use a font that supports Cyrillic characters. For instance: if you use the default font Helvetica, nothing will be rendered.

You can also find this information in the free ebook The Best iText Questions on StackOverflow.

getBytes() doesn't work for Cyrillic letters

2 Answers