I am working in a PDF project, where I need to grab all text from the PDF. I've got some problem decoding Identity-H Font using toUnicode dictionary table provide from the PDF itself. the toUnicode provide character mapping to unicode hex, but didn't provide the uppercase CID character to unicode (in table).. So is there way that can lowercase the input unichar before process mapping to unicode using the table?
Can I using the offset between the <000C> <0042> to calculate the uppercase character?
toUnicode table .
57 beginbfchar
<0001> <0020>
<0002> <0021>
<0003> <0026>
<0004> <2019>
<0005> <002C>
<0006> <002D>
<0007> <002E>
<0008> <003A>
<0009> <003F>
<000A> <0040>
<000B> <0041>
<000C> <0042>
<000D> <0043>
<000E> <0044>
<000F> <0045>
<0010> <0046>
<0011> <0047>
<0012> <0048>
<0013> <0049>
<0014> <004A>
<0015> <004B>
<0016> <004C>
<0017> <004D>
<0018> <004F>
<0019> <0050>
<001A> <0052>
<001B> <0053>
<001C> <0054>
<001D> <0055>
<001E> <0057>
<001F> <0059>
<0020> <2018>
<0021> <0061>
<0022> <0062>
<0023> <0063>
<0024> <0064>
<0025> <0065>
<0026> <0066>
<0027> <0067>
<0028> <0068>
<0029> <0069>
<002A> <006A>
<002B> <006B>
<002C> <006C>
<002D> <006D>
<002E> <006E>
<002F> <006F>
<0030> <0070>
<0031> <0072>
<0032> <0073>
<0033> <0074>
<0034> <0075>
<0035> <0077>
<0036> <0079>
<0037> <007A>
<0038> <FB01>
<0039> <00FC>
endbfchar
the table did not provide glyph that mapping to uppercase Character. So how to show the character?