I'm stumped by a 400-status when using the Google Docs API to write certain Unicode characters to a Google Doc:
HttpError: https://docs.googleapis.com/v1/documents/xxxxxxxxx:batchUpdate?alt=json returned "Invalid requests[0].insertText: The insertion index cannot be within a grapheme cluster.">
I've managed to boil this down to its smallest example. The routine below attempts to insert (in order):
- The string of Thai characters
'ถ้ามีลูกแล้วไม่ส่งเรียนพิเศษอะไรเลย จะเป็นอะไรไหมครับ' - A newline, above/before the string above
Code:
from google.oauth2.service_account import Credentials
from googleapiclient.discovery import build
SCOPES = [
"https://www.googleapis.com/auth/documents",
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/drive.file",
"https://www.googleapis.com/auth/drive.appdata",
"https://www.googleapis.com/auth/drive.metadata",
]
credentials = Credentials.from_service_account_file("data/gdocscredentials.json", scopes=SCOPES)
svc = build("docs", "v1", credentials=credentials).documents()
requests = [
{
"insertText": {
"location": {
# The zero-based index, in UTF-16 code units
"index": 1
},
"text": "ถ้ามีลูกแล้วไม่ส่งเรียนพิเศษอะไรเลย จะเป็นอะไรไหมครับ"
}
},
{"insertText": {"location": {"index": 1}, "text": "\n"}}
]
svc.batchUpdate(
documentId="xxxxxxxxx",
body={"requests": requests}
).execute()
So, the error alludes to the Thai string containing (or being) a grapheme cluster. The docs refer to this:
The API may implicitly adjust the location to prevent insertions within Unicode grapheme clusters. When this happens, the text is inserted immediately after the grapheme cluster.
How do I properly correct this error?
Google mentions that indexes are measured in UTF-16 code units. But that doesn't seem like it should matter here since this code snippet is using the 'work backwards' approach that is itself recommended by the same documentation page.