i am currently working on a project in which want to select a specific items from differents pdf files. I want to select the date, and a specific number and it's corresponding name like ( id --> 346, rol number--> 668) not all the number in the pdf file. I use the sparkocr to extract all the content from the pdf and i use NER pretrained model to detecte entity. But the result that i am getting is something like (date-->DATE, 346-->CARDINAL, 668-->CARDINAL). Any idea on how i can approach that problem ? I want numbers and their corresponding name like (id : 346, rol number --> 688) not CARDINAL. Thanks,
0
votes