I am extracting data from multiple PDF's using set of search words.
Table_search <- list("Table 14", "Listing [0-9]", "Program")
Table_match_list <- sapply(Table_search, grep, x = tablelist, value = TRUE)
This code loops through PDF file and searches for the key words and extracts that line from the PDF. I get a difference in length between keywords like the error below. This is due to missing keywords in specific pages, if the code comes across any missing values it should be able to print NA
so that code goes to next page and looks for keywords and so on.
If I print NA
for blank cells then my final out put should have equal number of rows for each keyword I search for.
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 102, 98, 99
I asked to search for three words and the output is 102, 98, 99 respectively. Instead I should have 102 rows for each keyword i search for.
Here 102 because I am looping through 102 PDF files.
Please advise how can we achieve this.
Thank you Bharath
@Ronak ------- Updated This is what I get out of 102 PDF files. 3 Sublists are 3 different keywords. First word is in all PDFs, second word is in 98 PDFs, third one is in 99 PDF's.
This is what I get from your code.
How I need is, It doesn't have to print NULL for every line of PDF. Just one NULL per PDF "If keyword is missing".
tryCatch
in the function you use for finding the line of the keyword. - Mohan Govindasamydput(tablelist)
to your post so that I can use the data to verify the answer? - Ronak ShahTable_match_list <- sapply(Table_search, function(x) {tmp <- grep(x, tablelist, value = TRUE);if(length(tmp) > 0) toString(tmp) else NA})
What does this return? - Ronak Shah