I have a pdf file which can not be extracted text by pdfbox or itext7. The font is encoded by Identity-H with Adobe-Identity-UCS. The details of ToUnicode are given below.
/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo > def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000><FFFF> endcodespacerange endcmap CMapName currentdict /CMap defineresource pop end end
The ToUnicode is invalid. Is there any way to fixed it?
I tried to download an intact Adobe-Identity-UCS cmap file and to replace it. But after a lot of google searching, I can't find the Adobe-Identity-UCS cmap file.
Any help? Thanks.
Edit: