I am writing a Java function which takes a String as a parameter and produce a PDF as an output with PDFBox.
Everything is working fine as long as I use latin characters. However, I don't know in advance what will be the input, and it might be some English as well as Chinese or Japanese characters.
In the case of non latin characters, here is the error I get:
Exception in thread "main" java.lang.IllegalArgumentException: U+3053 ('kohiragana') is not available in this font Helvetica encoding: WinAnsiEncoding
at org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:426)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:324)
at org.apache.pdfbox.pdmodel.PDPageContentStream.showTextInternal(PDPageContentStream.java:509)
at org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:471)
at com.mylib.pdf.PDFBuilder.generatePdfFromString(PDFBuilder.java:122)
at com.mylib.pdf.PDFBuilder.main(PDFBuilder.java:111)
If I understand correctly, I have to use a specific font for Japanese, another one for Chinese and so on, because the one that I am using (Helvetiva) doesn't handle all required unicode characters.
I could also use a font which handle all these unicode characters, such as Arial Unicode. However this font is under a specific license so I cannot use it and I haven't found another one.
I found some projects that want to overcome this issue, like the Google NOTO project. However, this project provides multiple font files. So I would have to choose, at runtime, the correct file to load depending on the input I have.
So I am facing 2 options, one of which I don't know how to implement properly:
Keep searching for a font that handle almost every unicode character (where is this grail I am desperately seeking?!)
Try to detect which language is used and select a font depending on it. Despite the fact that I don't know (yet) how to do that, I don't find it to be a clean implementation, as the mapping between the input and the font file will be hardcoded, meaning I will have to hardcode all the possible mappings.
Is there another solution?
Am I completely off tracks?
Thanks in advance for your help and guidance!
Here is the code I use to generate the PDF:
public static void main(String args[]) throws IOException {
String latinText = "This is latin text";
String japaneseText = "これは日本語です";
// This works good
generatePdfFromString(latinText);
// This generate an error
generatePdfFromString(japaneseText);
}
private static OutputStream generatePdfFromString(String content) throws IOException {
PDPage page = new PDPage();
try (PDDocument doc = new PDDocument();
PDPageContentStream contentStream = new PDPageContentStream(doc, page)) {
doc.addPage(page);
contentStream.setFont(PDType1Font.HELVETICA, 12);
// Or load a specific font from a file
// contentStream.setFont(PDType0Font.load(this.doc, new File("/fontPath.ttf")), 12);
contentStream.beginText();
contentStream.showText(content);
contentStream.endText();
contentStream.close();
OutputStream os = new ByteArrayOutputStream();
doc.save(os);
return os;
}
}