13
votes

Do most (IE, FF, Safari, Chrome, Opera) make multiple HTTP Requests for a PDF file when displaying the PDF in a browser? I am working on an issue integrating with WebTrends Web Analytics software, and the statistics around PDFs appear to be incorrect. Support told me that because WebTrends parses the Web Servers access logs to determine traffic, downloads, etc. it has a difficult time determining accurate PDF downloads because:
When a user clicks on a PDF and the PDF opens in the user's browser via the Acrobat Reader browser plug-in, each page is downloaded one-at-a-time -- it does this to conserve bandwidth, if a user only views the first 2 pages of a 50 page PDF, only the first 2 pages are downloaded.

This sounds fishy to me (how could a HTTP Request be made to only serve out a portion of a binary file?) -- I've been searching Google, but haven't found anything that speaks to this.

I will try to find some IE software that lets me sniff the HTTP traffic tomorrow to see if i can observe this phenomenon.

Any info/thoughts are appreciated though.

4
Not an answer as such, but http does support downloading parts of files via the content-range header. Perhaps PDF uses it... shrugsWill
I've found Fiddler very handy for such IP packet sniffing.Nate C-K

4 Answers

13
votes

If your site returns an HTTP response header like this:

Accept-Ranges: bytes

the PDF reader will close the intitial connection after reading just a few KB of the document. It then requests sections of the document as required with the Range request header, e.g.:

Range: bytes=242107-244329, 8060-76128

An example of a URL that does this is http://www.ovationguitars.com/img/OVmanual.pdf .

If you don't return the Accept-Ranges header then the PDF document will be downloaded in a single request (e.g. http://manuals.info.apple.com/en/iphone_user_guide.pdf )

You can see the behavior of the PDF reader in IE using HttpWatch.

** Disclaimer: This answer was posted by Simtec Limited, the makers of HttpWatch **

2
votes

For me as of June 2016, Firefox and IE11 only make one call.

Chrome makes two calls if there is no Content-Disposition header. When it is missing, Chrome does two GETs, seems to cancel the second, and shows the PDF in the browser. The server does not know that the second is cancelled, and sends out the PDF again.

When this header is sent from the server, Chrome only makes one call and launches or saves the file.

Content-Disposition: attachment

(You can also suggest the file name to be used when the user saves the file...)

Content-Disposition: attachment; filename=test.pdf
0
votes

My thoughts are that you are spot on: your plug-in can not (and should not) split PDF's into requests.

I have a web application which serves PDF files from a request (a single request) and displays in a plug-in. It displays the entire PDF without getting any more information.

Also, if you are looking for a HTTP sniffer you could try Fiddler. I have found this useful during web site debugging.

0
votes

In my tests, double requests to a PDF occours in Chrome if I have the REST Console 4.0.2 extension enabled. Disabling this extension makes Chrome work as expected (only one request).

Edit: Instapaper extension enabled also makes Chrome do double requests to PDF.