4
votes

I want to extract pages from a PDF file which has custom page numbering, e.g. there are pages with the number C1, C2, C3, and after that, 1,2,3,4 etc. starts.

When I use

$ gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER \
   -dFirstPage=22 -dLastPage=36 \
   -sOutputFile=outfile_p22-p36.pdf 100p-inputfile.pdf

FirstPage and LastPage are the page index, starting to count at the first page - which is not what I want

How can I tell GhostView to use the "real" page numbers?

1
You can't. The 'real' page numbers are the ones Ghostscript is already using, the custom page 'numbers' are just labels. - KenS
That's unbelievable. Page numbers/labels have been around forever. - zonksoft
@RafaelReiter: They are just labels and can be anything ("foo", "äöüß", ...). - Martin Schröder
@MartinSchröder: I know, that's why they are very convenient! - zonksoft

1 Answers

3
votes

You can, given a lot of knowledge about the internals of Ghostscript's PDF interpreter, access the page numbers. It will require a lot of looking around in the Resource/Init/pdf*.ps files (mostly just pdf_main.ps) and an understanding of PostScript, but it is possible. Just not for the faint of heart.

To see an example PS program which digs around inside a PDF to glean information, have a look at toolbin/pdf_info.ps.

If someone comes up with a patch to allow FirstPage/LastPage to take names as labels, then we will consider it. A part of this patch should be a change add an option to pdf_info.ps to print the labels and the real page numbers.