1
votes

We have a "website" composed of entirely static content in English language. The total HTML files for the site amount to around 300MB of data (ie exclusive of images, just the HTML text files.)

The content is usually used off-line in a simple web server appliance for remote area schools where there is no Internet availability.

If I serve the same content from a webserver on the Internet, and give the URL to the Google Translate site, an excellent translation results, and we can step around the website via links just as in the original.

So we know that Google Translate will do a good job of translating the content, and does so quite automatically via the on-line Google Translate service.

My question is how to best go about translating the HTML files, in bulk, into several other languages, using the Google Translate service or some equivalent.

The translation obviously has to recognise the HTML and just translate the actual English language content, which the on-line Google Translation service does perfectly.

This seems like it would be a fairly common requirement, but I can't find a simple answer as to how to go about it.

I would greatly appreciate any suggestions.

Thanks in advance.

1
Google translate ignores html markup, you can just send your files to their API. The API only costs $20/million characters so you are probably looking at $500-750 for the whole site. You could probably cut that in half by writing an sax style html parser which handles data by sending it to the apiNick Bailey

1 Answers

0
votes

Thanks for the suggestions.

Submitting the files one by one to the Google API would work nicely if it returns the files with the markup unaltered and the content translated.

But $20 / M chars X 300 MB data comes to around $6000 for each language translation I think. We need at least three language translations. So close to $20k - perhaps a little over the top for a community based, entirely voluntarily staffed project, even with some reduction with some clever coding.

Localise.js offer "Unlimited free machine translations" for translating content in-house. So that looks like it will be worth following up.

One of the languages we need is Khmer, so it will be interesting to see if that is included in their list of 100 languages (which I haven't uncovered on their site yet)