I am using TIKA and Tesseract for OCR text extraction from pdf files that contain scanned images. I have managed to parse pdf documents containing images with ResursiveParserWrapper instead of Parser and it is working fine however the client wants to do all the configurations related to Tesseract OCR somewhere else and use existing code as it is to extract OCR text extraction from all supported formats.
The existing code uses simple Parser to extract data. Can anybody help me and explain why we use RecursiveParserWrapper instead of normal Parser when we are going to extract data from images or pdfs containing scanned images.