0
votes

Currently I am trying to use PDFBox in Eclipse to run multiple PDF files in a folder through a text reader that will extract certain terms and output them into a text file that I will then convert to an excel sheet. Currently I have the program and it works correctly for a single PDF file:

public static void main(String args[]) throws IOException {

  //Loading an existing document
  File file = new File("ADE_acetylfuranoside_120319_pfister.pdf");
  PDDocument document = PDDocument.load(file);

  //Instantiate PDFTextStripper class
  PDFTextStripper pdfStripper = new PDFTextStripper();

  //Retrieving text from PDF document
  String text = pdfStripper.getText(document);

//..."Actual code that extracts text"...

  PrintStream o = new PrintStream(new File("output.txt"));
  PrintStream console = System.out; 
  System.setOut(o); 
  System.out.println(finalSheet);

my problem is that I want to run 500 PDFs in one folder through this program on eclipse rather than putting in the name of each one individually. I also want it to output like:

Name1, Number1, ID1 Name2, Number2, ID2

but I think the way it is written now it will just overwrite line number one if I run multiple PDFs though it.

Thanks for the help!

1

1 Answers

0
votes

For the first part, you could just use the File class with a FileFilter:

// directoryName could be as simple a "."
File folder = new File(directoryName);
File[] listOfFiles = folder.listFiles(new FileFilter() {
    @Override
    public boolean accept(File pathname) {
        return pathname.getName().toLowerCase().endsWith(".pdf");
    }
});

This gives you an array of File objects of all the files in a particular folder/directory. Now you can loop through it with pretty much the code you have.

On the output side, you'll likely want to correlate the output with the input. I'm a bit confused by your code and I'm guessing you'd just like an output file for each input file. So, perhaps, something like:

// index is the value you used to loop through the `listOfFiles` array
try( FileWriter fileWriter = new FileWriter(listOfFiles[index].getName() + ".output.txt" ) ) {
    fileWriter.write( // the String text you want in the file );
}

This creates a file named (as taken from your example) "ADE_acetylfuranoside_120319_pfister.pdf.output.txt". Obviously this could change. In this case a new file is created for each input file.