0
votes

Background: I am running some scripts that have to cleanup few thousand records. While running it from spanner console, I am usually getting errors like - 'Transaction was aborted. It was wounded by a higher priority transaction...' OR simply 'Transaction is aborted'

I can understand it may be due to some conflict with real time requests (discussion for another day). So I am trying GCP suggested partitioned-dml for such operation.

Problem: I have noticed that partitioned-dml is very slow and it times out very soon when I am running it from Cloudshell.

What I tried: I tried even with few records but it takes a lot of time and with records in thousands it times out.

Sample script- gcloud spanner databases execute-sql --instance=<> --enable-partitioned-dml --sql='Delete from ABC where X="M" and Y="N"'

Error message returned on cloud shell-

"ERROR: (gcloud.spanner.databases.execute-sql) ('The read operation timed out',) This may be due to network connectivity issues. Please check your network settings, and the status of the service you are trying to reach."

2

2 Answers

2
votes

This could be related to the query itself. Make sure you are including the primary index in the WHERE clause of it, you can see an example here. Also, it is not recommended to use Cloud Shell for such intensive operations. It would be better to use a Compute Engine VM to connect to it and run the operation from it.

2
votes

For Partitioned DML it is okay not to specify primary keys, since Spanner will automatically partition the transaction for you. However, it does have to scan the whole table, so can easily time out when run from the shell.

I wouldn't necessarily say you need to run it from a Compute Engine VM, since it doesn't send data back to the client; but you probably do want to run the request from code instead of from the shell, so you can control the timeout. Sample code is at https://cloud.google.com/spanner/docs/dml-partitioned