0
votes

I have pdf files that are primarily large images of old newspaper pages. Using gs 9.06 these are properly rendered but on my website (using ghostscript 8.70) they show with a noisy gray background. The following link gives an example (downloads a single page from gs):

http://mvtm.ca/collections/php/serve_pdfpage.php?file=1940-04-11&page=01

the actual gs command is: gs -q -sDEVICE=pdfwrite -r200 -dNOPAUSE -dBATCH -dSAFER -dFirstPage=01 -dLastPage=01 -sOutputFile=- mypdffile.pdf

It appears that the image layer (the only thing that should be rendered) is not shown. The pdfs were also processed by an OCR program to add background text for searching.

This exact procedure operates correctly on my local machine (Mac OS X).

Does anyone know what is happening here?

2

2 Answers

0
votes

I'm not entirely sure what you are seeing as a problem. You say that using an up to date version of Ghostscript works 'properly' and using an old version doesn't ?

This simply suggests to me that a bug has been fixed sometime in the last four years. It seems to me that you should upgrade your 8.70 installation.

Note though, that there isn't really much that can be said by looking at the broken output, I'd need to see the original file before it got broken to have a stab at guessing what the bug was.

In isolation my 'guess' would be that the original file is using either a JBIG2 or JPX encoded image as the background, and that either our JBIG2 decoder had a bug (a few have been fixed) or in the case of JPX that the JasPER decoder has a bug. We stopped using JasPER because it was slow, memory hungry, bug-ridden and effectively unsupported, and moved to OpenJPEG instead.

However perhaps I'm missing your point.

0
votes

I'm not sure exactly where you are coming from or where you are going. Your extracting one page of one pdf and outputting another pdf, but the link is of a .png. It might be helpful to see the input and intermediate .pdf

what I'm seeing is a monochrome bitmap. You could add -dProcessColorModel=/DeviceGray or -dProcessColorModel=/DeviceRGB which would allow the output pdf to have tone, rather than just black and white.

Another possibility would be to add a transformation curve, which would lighten up the background while darkening up the black, but there seems to be something else happening. The headlines in the output seems to have missing letters. Even if grayscale and better black and white contrast, something else seems to be happening.