0
votes

I found some answers but none of them works for me. I want to make a pdf file from a html, but the problem is that my html has Cyrilic letters and I found that there's something to do with this simple code:

String s = "Здраво Kris";

byte bytes[] = s.getBytes("UTF-8");

String value = new String(bytes, "ISO-8859-1");

// I tried with new String(bytes, "UTF-8") but it didn't work

Then I pass the value to my pdf generator function but it outputs only the part from the string s that is not in Cyrilic, i.e. Kris

 htp.CreatePDF("<html><head><title>kristijan</title></head><body><h1>" + value + "</h1></body></html>", "kris");
2
try this byte[] bytes = s.getBytes("ISO-8859-1");Abhishek
Did you try with the String s? htp.CreatePDF("<html><head><title>kristijan</title></head><body><h1>" + s + "</h1></body></html>", "kris"); also CreatePDF looks like C# not Java.Elliott Frisch
I noticed I misplaced the brackets but doesn't work againKristijan Iliev
String objects in Java are always implicitly encoded in UTF-16. You can not change that encoding.Abhishek
It's java and it doesn't work with passing just sKristijan Iliev

2 Answers

2
votes

Please take a look at my answer to this question: Can't get Czech characters while generating a PDF

Several things can go wrong in your code.

This is a very bad idea:

String s = "Здраво Kris";

Suppose that you send your .java file including this code to somebody who saves it as ASCII, then your source code will change into this:

String s = "Здраво Kris";

I've also seen this happen when storing a document into a source control system.

Bottom line: never use special encodings when writing source code with hard-coded strings. Either store the strings in a file using the right encoding to write and read the string, or use the unicode notation if you insist on having hard-coded data in your source code.

Even if you store the file containing this string correctly, you have to be very careful when compiling the code. If the compiler uses a different encoding, s will be corrupted too.

You also have to make sure that you're reading the data correctly when converting the HTML to PDF. I assume that you are using XML Worker (and not the obsolete HTMLWorker class). There are different places where you can indicate which encoding to use.

Finally, you have to make sure that you use a font that supports Cyrillic characters. For instance: if you use the default font Helvetica, nothing will be rendered.

You can also find this information in the free ebook The Best iText Questions on StackOverflow.

0
votes

One way to get around the inability (?) of createPDF to handle full unicode range of characters in Java (!) would be to investigate the

String s = "Здраво Kris";

for characters greater than 0x80. These must be replaced by the corresponding numeric HTML entity.

You can easily verify this by setting the String s to these entities and see what happens if this string is embedded.