1
votes

I have a HTML String with Chinese/Korean characters. I want to convert the HTML to PDF using iText. I have read that we need to embed the FONT to the PDF to get the unicode characters to show up on PDF.

When I am trying to embed wts11.ttf (With encoding IDENTITY_H) or STSong-Light( with encodingUniGB-UCS2-H), I am able to see only Chinese characters but I cannot see Korean characters. I tried using arialuni.ttf (With encoding IDENTITY_H) but still can see only Chinese characters but not Korean.

Can someone please tell me what should be exact font. Or if I am missing something.

Below is the code snippet:

Document document = new Document();
Paragraph paragraph=new Paragraph();
PdfWriter.getInstance(document, baos);
document.open();
BaseFont bff = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.EMBEDDED);
Font f = new Font(bff);

// FontFactory.registerDirectories(); 
// Font f = FontFactory.getFont("Arial Unicode MS", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);

document.add(new Paragraph());
HTMLWorker htmlWorker = new HTMLWorker(document);

List<Element> objects=htmlWorker.parseToList(new StringReader(message),null);
paragraph.setFont(f);
for (Element elem : objects) {
    paragraph.add(elem);
}
document.add(paragraph);
2
Please check other recent questions and you'll find out that HTMLWorker has been abandoned in favor of XML Worker. Please throw away your current code and replace it with code that uses XML Worker.Bruno Lowagie
I am currently using older version of iText and hence, using HTMLWorker class. Can you please suggest if it is possible using HTMLWorker?user1661892
No, every minute you spend on the older version of iText is money down the drain.Bruno Lowagie

2 Answers

1
votes

There are different ways to solve this problem if you upgrade to using XML Worker.

I reused the code from the official examples, more specifically the ParseHtmlAsian example, and I adapted the HTML that is used as the source for this example like this:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    </head>
    <body>
    <p><span style="font-size:12.0pt; font-family:MS Mincho">長空</span>
    <span style="font-size:12.0pt; font-family:Times New Roman,serif">(Broken Sword),</span>
    <span style="font-size:12.0pt; font-family:MS Mincho">秦王殘劍</span>
    <span style="font-size:12.0pt; font-family:Times New Roman,serif">(Flying Snow),</span>
    <span style="font-size:12.0pt; font-family:MS Mincho">飛雪</span>
    <span style="font-size:12.0pt; font-family:Times New Roman,serif">(Moon), </span>
    <span style="font-size:12.0pt; font-family:MS Mincho">如月</span>
    <span style="font-size:12.0pt; font-family:Times New Roman,serif">(the King), and</span>
    <span style="font-size:12.0pt; font-family:MS Mincho">秦王</span>
    <span style="font-size:12.0pt; font-family:Times New Roman,serif">(Sky).</span></p>
    <p style="font-size: 12.0pt; font-family:Batang">빈집</p>
    <p>Test</p>
    </body>
</html>

The result looks like this:

enter image description here

As you can see, all the text is rendered correctly, so please do not spread incorrect messages such as "iText not rendering Chinese/Korean characters" ;-)

Please forward this answer to your management so that your CTO understands that investing time in an old iText version is more expensive than buying a license to use the new iText version.

0
votes
  1. dowload Malgun-Gothic-Bold_29380.ttf font.
  2. store it in asset->fonts->Malgun-Gothic-Bold_29380.ttf font
  3. this code will work for cjk and English and vitenames

Font fontbold=FontFactory.getFont("assets/fonts/Malgun-Gothic-Bold_29380.ttf", BaseFont.IDENTITY_H,BaseFont.EMBEDDED, 12);