0
votes

I am using Google Translate API to translate a excel column from Japanese to English. The Japanese column not only contains Japanese characters but some numeric symbols like ①, ⑥ etc.

No problem in translating the Japanese characters but the symbols gets converted into a gibberish.
Example:
Japanese: #⑥その他
English: # â‘¥ Other

But the same text works fine with Google Translate Web

enter image description here

How to prevent translating symbols in Google Translate API?

1
More likely than not, symbols occupy a specific range of Unicode numbers which you might remove from the original text before giving it for translation. - Variatus

1 Answers

0
votes

The issue comes from mixing numeric symbols with a language, since then it's harder for the Translation API to detect which is the source language.

I don't know which method you are using to call the Translation API, but in any case, specifying the source language solves the issue.

For example, with a REST call from the Command Line Interface:

curl -X POST -H "Authorization: Bearer "\
$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" --data "{
  'q': '#⑥その他',
  'source': 'ja',
  'target': 'en'
}" "https://translation.googleapis.com/language/translate/v2"

Will return "# ⑥ Other" as the result of the translation.