2
votes

I am trying to convert a postscript file which contains some telugu Font (i.e Vani Bold). After converting the file into pdf I am not able to copy the text from generated pdf file .When I see the properties of pdf file in centos document viewer it is showing like below enter image description here

I am using below command to convert postscript file to pdf

bin/gs -dBATCH -sDEVICE=pdfwrite -sNOPAUSE -dQUITE -sOutputFile=/home/cloudera/Desktop/PrintTest/telugu.pdf /home/cloudera/Desktop/PrintTest/VirtualPrinter_27_09_2016_19_11_41_691.ps

I tried with ghostscript 9.19 and 9.20 as well,but no change.

Following is the link to my postscript file which I am trying to convert into pdf. click here for postscript file

I have been struggling with this since 10 days .Please provide some solution for this.

1

1 Answers

0
votes

I can tell you why you can't copy & paste the text, but I'm not sure I can provide an acceptable solution.

First, not all pdf viewers can deal with unicode characters (for example,xpdf can't, it just ignores them, while mudpf and qpdfview work).

Second, to be able to convert font glyphs to unicode characters, the font object in the PDF file must contain a /ToUnicode property. If you look at the generated PDF after decompression (mutool clean -d), you can see that the Vani font in object 8 0 doesn't have it, while both the Arial font in object 10 0 and the Calibri font in object 12 0 do.

So very likely the Vani font is missing this unicode translation information, you need to either add this information (e.g. with fontforge), or choose a different font that has this information.

Related question: