Solr PDF search: "Go to page" function

Question

We're building a PDF search machine with Solr and Lucene where users can search for text in PDFs. The database only contains PDFs.

In the search results page ("/browse") we want to append the PDF file with #page=X where X is the page the text was found on. (Adobe Acrobat automatically scrolls to a certain page if specified with an anchor tag.)

For example, if I search for foobar and there's a pdf document where foobar is on page 5, the link should be http://pdfserver/pdfs/pdf.pdf#page=5 (note the anchor at the end).

Is this possible?
How would we get this page number?

i don't think i understand what you're actually trying to achieve. Do you want to index pdf files and any search that you make to return the page number of the matched text or is it something else? — omu_negru
Exactly that. So if I search for "foobar" and there's a pdf document where "foobar" is on page 5, the link should be pdfserver/pdfs/pdf.pdf#page=5 — Simon Fredsted
Did you ever find a solution to this? Seems like a basic requirement when indexing a load of PDF files. — MrTelly
@MrTelly, I used the #search solution and URL-encoding the search term. — Simon Fredsted

Simon Fredsted Simon Fredsted · Accepted Answer · 2014-06-30T12:29:49

One easy-to-implement solution I found was to use the #search parameter that Adobe Reader supports when embedded in IE.

For example:

http://pdfserver/pdfs/pdf.pdf#search=foobar

Adobe Reader then jumps to the page.

One would need to URL-encode the search terms, of course.

Solr PDF search: "Go to page" function

2 Answers