2
votes

I am using docx4j to create pdf files, with docx format the locale language is rendered properly but with pdf # replaced with locale strings.

In document I saw

When docx4j is used to create a PDF, it can only use fonts which are available to it. These fonts come from 2 sources:

->those installed on the computer

->those embedded in the document

Note that Word silently performs font substitution. When you open an existing document in Word,and select text in a particular font, the actual font you see on the screen won't be the font reported in the ribbon if it is not installed on your computer or embedded in the document. To see whether Word 2007 is substituting a font, go into Word Options > Advanced > Show Document Content and press the "Font Substitution" button.

Word's font substitution information is not available to docx4j. As a developer, you 3 options:

->ensure the font is installed or embedded

->tell docx4j which font to use instead, or

->allow docx4j to fallback to a default font

To embed a font in a document, open it in Word on a computer which has the font installed (check no substitution is occuring), and go to Word Options > Save > Embed Fonts in File

But this doesnt seem to work.

Below is my code:

        Mapper fontMapper = new IdentityPlusMapper();

        PhysicalFont font = PhysicalFonts.getPhysicalFonts().get(
                "Comic Sans MS");

        fontMapper.getFontMappings().put("Algerian", font);

        template.setFontMapper(fontMapper);

        PdfSettings pdfSettings = new PdfSettings();

        org.docx4j.convert.out.pdf.PdfConversion conversion = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
                template);

        OutputStream out = new FileOutputStream(f1);
        conversion.output(out, pdfSettings);

In above code font is Algerain

Any help will be much appreciated.

2
Please post the XML for the run of text in question, and the XML for the relevant styles. Alternatively, post the docx somewhere. - JasonPlutext

2 Answers

1
votes

Posting this answer because I saw this question raised many times with UTF encoding hope this post helps. this piece of code solved the above problem.

   File f = new File("/path/to/sample.docx");   
   template.save(f);
   File f1 = new File("/path/to/sample.pdf");
   Runtime.getRuntime().exec("doc2pdf " + f);

If sample.docx is our input docx file containing any international language like Chinese etc it will be converted to pdf with same filename and at same path.

This is because Runtime.getRuntime().exec("doc2pdf " + f); this piece of code runs the terminal command doc2pdf in java program with unbuntu as OS,before this we need to install sudo apt-get install unoconv from terminal this is for doc2pdf command to work.

0
votes

Embedded fonts can be extracted and made available manually, like so:

    Mapper fontMapper = new IdentityPlusMapper();
    wordMLPackage.setFontMapper(fontMapper);
    FontTablePart fontTablePart= wordMLPackage.getMainDocumentPart().getFontTablePart();
    fontTablePart.processEmbeddings();
    Set<String> fontsInUse = wordMLPackage.getMainDocumentPart().fontsInUse();
    // Make each embedded font available to the font mapper.
    for(String s : fontsInUse) {
        PhysicalFont physicalFont = PhysicalFonts.get(s);
        fontMapper.put(s, physicalFont);
    }
    // Now you can access your fonts, such as 'Comic Sans' or 'Arial Unicode MS'.  
    PhysicalFont font = PhysicalFonts.getPhysicalFonts().get(
            "Comic Sans MS");
    fontMapper.put(Mapper.FONT_FALLBACK, font);