2
votes

I need to get pages count from word documents. I've tested many libraries and scripts (apache poi, perl scripts, some application for linux and some more) and the only working solution was to install Microsoft Office with Wine and access OLE with perl. I've managed to do it but it seems I can't use it on server due to licensing problems...

The problem with apachepoi and other solutions providing access to word documents info is related to incompleteness of some docs. pageCount property in document summary is sometimes missing (it's often case with odt documents saved as doc and older docs).

Is there any way to actually count pages (not only get info from summary) without installing Microsoft Office on server?

2

2 Answers

2
votes

I was going to say wvSummary, but I think this uses the metadata you're referring to. I'm not sure there is a way to get the page count without actually laying out the document. So you might have to resort to using APIs to drive a real Office-compatible application like OpenOffice or AbiWord.

1
votes

If you trust the document summary, instead of using wvSummary, you can just open the file and do a Regex search for "nofpages(\d+)". Groups[1] will contain the number of pages.

Since Word always saves the summary when it saves, I think this is pretty safe if you know the document was last saved with Word, which in my experience is 99% of the time.