I have multiple PDFs and I want to extract text from a certain region from their first pages. So, given I have the coordinates for the bounding box for the text in the PDF, how do I extract that text using command line.
I researched a bit and found that PDFMiner and PDFBox can do this. But PDFMiner is very poorly documented.
Can someone tell me how to do this using PDFMiner? OR if you could suggest some other solution?
PS: I am on Linux Terminal.