0
votes

I'm about to make a translator site (in PHP) where people can order a translator to translate their documents. From the site people are able to upload their file then it would be connected to a translator/member of the site. The problem is how I make an application that count the price from the document.

The most common way to rate the translation price is per word. So I need to know how many words in the document that a customer's uploaded. I thought there must be possible to count words from text file such as a word document. However, I couldn't find any way to get exact amount of a ms word 2003 document (.doc). I've found a way to count .docx, but not .doc. And there will be more files such as PDF, or rtf.

I've seen another method which only count the file size, but I don't think it would give the same result for different document format. Or it is? The simple way I could think is to ask the visitors to copy/paste their text on a textarea, but I don't think this is the best way.

Would someone gives me an advice how can I solve this?

1
If you want to do this well, you are going to need routines that open each individual document based on its file extension. There are api's available in php that should be able to extract the text from the documents in each case, but if you're hoping for a "get the text from every type of document function" there is nothing like that. - gview

1 Answers

0
votes

If you're running your site on a *nix server, you might want to try the following:

$word_count = system("wc -w " . $filename);

And, yes, I've been lead to believe that it works with .doc and .docx documents. PDF's are a whole other story. I'll have to research that one.