0
votes

I am currently testing Google Cloud Spanner as an alternative to MySQL for one project since I expect the amount of rows to grow to hundreds of millions over time. The DB needs to respond very fast and return query results within seconds so I thought I would give Spanner a try.

I tried to bulk load sample data to my Spanner DB, however I keep getting this error:

Unhandled Exception: System.AggregateException: One or more errors occurred. ---> Google.Cloud.Spanner.Data.SpannerException: The operation was aborted. ---> Grpc.Core.RpcException: Status(StatusCode=Aborted, Detail="Aborted due to transient fault")

I am using a slighly modified code from here: https://github.com/GoogleCloudPlatform/dotnet-docs-samples/blob/master/spanner/api/Program.cs

Here is the code modification in InsertSampleData:

public static object InsertSampleData(string projectId,
        string instanceId, string databaseId)
    {
        // I get about 100k rows here
        List<Data> data = get_data();

        // how many runs I need to make if I split the data by 100 rows
        int rows = 100;
        double cnt = (double)data.Count / rows;
        cnt = Math.Ceiling(cnt);

        // process the data part by part
        for (int i = 0; i < cnt; i++)
        {
            // returns part of the data based on offset and amount
            List<Data> data_part = get_part(data, i, rows);

            var response = InsertTradesAsync(
            projectId, instanceId, databaseId, data_part);
            s_logger.Info("Waiting for operation to complete...");
            response.Wait();
            s_logger.Info($"Operation status: {response.Status}");
        }

        return ExitCode.Success;
    }

InsertTradesAsync is the same as in the repo (except for the parameter list of course).

When I run the code I always get the following error:

Unhandled Exception: System.AggregateException: One or more errors occurred. ---> Google.Cloud.Spanner.Data.SpannerException: The operation was aborted. ---> Grpc.Core.RpcException: Status(StatusCode=Aborted, Detail="Aborted due to transient fault")
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
   at Google.Api.Gax.Grpc.ApiCallRetryExtensions.<>c__DisplayClass0_0`2.<<WithRetry>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Google.Cloud.Spanner.V1.Internal.ExecuteHelper.<WithSessionChecking>d__0`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Google.Cloud.Spanner.V1.TransactionPool.<RunFinalMethodAsync>d__9`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Google.Cloud.Spanner.Data.SpannerTransaction.<<CommitAsync>b__29_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Google.Cloud.Spanner.Data.ExecuteHelper.<WithErrorTranslationAndProfiling>d__2`1.MoveNext()
   --- End of inner exception stack trace ---
   at Google.Cloud.Spanner.Data.ExecuteHelper.<WithErrorTranslationAndProfiling>d__2`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Google.Cloud.Spanner.Data.EphemeralTransaction.<>c__DisplayClass2_0.<<ExecuteMutationsAsync>b__1>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Google.Cloud.Spanner.Data.ExecuteHelper.<WithErrorTranslationAndProfiling>d__2`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Google.Cloud.Spanner.Data.SpannerCommand.<ExecuteMutationsAsync>d__49.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at GoogleCloudSamples.Spanner.Program.<InsertTradesAsync>d__25.MoveNext() in c:\Users\user\Documents\Dev\spanner\api\Program.cs:line 1298
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Task.Wait()
   at GoogleCloudSamples.Spanner.Program.InsertSampleData(String projectId, String instanceId, String databaseId) in c:\Users\user\Documents\Dev\spanner\api\Program.cs:line 1585
   at GoogleCloudSamples.Spanner.Program.<>c__DisplayClass45_0.<Main>b__2(InsertSampleDataOptions opts) in c:\Users\user\Documents\Dev\spanner\api\Program.cs:line 1932
   at CommandLine.ParserResultExtensions.MapResult[T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,TResult](ParserResult`1 result, Func`2 parsedFunc1, Func`2 parsedFunc2, Func`2 parsedFunc3, Func`2 parsedFunc4, Func`2 parsedFunc5, Func`2 parsedFunc6, Func`2 parsedFunc7, Func`2 parsedFunc8, Func`2 parsedFunc9, Func`2 parsedFunc10, Func`2 parsedFunc11, Func`2 parsedFunc12, Func`2 parsedFunc13, Func`2 parsedFunc14, Func`2 parsedFunc15, Func`2 parsedFunc16, Func`2 notParsedFunc)
   at GoogleCloudSamples.Spanner.Program.Main(String[] args) in c:\Users\user\Documents\Dev\spanner\api\Program.cs:line 1910

I thought this might be related to limits and quotas (https://cloud.google.com/spanner/quotas), however the exception is thrown after different amount of rows were inserted into my table (it seems to be random, sometimes it happens after 68 runs with 100 rows each, then 28x100, 52x100 etc.). The table has 30 columns and PK composed from 2 columns (and no indexes) and I am processing the data in parts by 100 so I don't think I am hitting the limit.

If I set cmd.CommandTimeout to a very high number I get to much higher inserted rows (about 400x100) - I assume the client library reuses the connections? However I couldn't find any information regarding that or poooling for the C# library. And even though more rows get inserted, the error still occurs.

Any help would be very much appreciated.

Thanks!

1

1 Answers

0
votes

Google developer here. I have a few suggestions that might help you.

The first is that you should use a single transaction and add as many writes as possible to it. It looks like you do 100 at a time? But you can do more. You can see limits of how many writes can be batched here (I believe it's 20,000):

https://github.com/GoogleCloudPlatform/dotnet-docs-samples/blob/master/spanner/api/Program.cs#L1242

The second suggestion addresses your question directly. Because of how spanner works, you need to use retries around your transaction as shown in the example here:

https://github.com/GoogleCloudPlatform/dotnet-docs-samples/blob/master/spanner/api/Program.cs#L1259

(Download the transient fault application nuget here): https://www.nuget.org/packages/EnterpriseLibrary.TransientFaultHandling/

You need this because spanner may occasionally experience a deadlock which forces you to rerun your transaction in its entirety. We provide an extension method "IsTransientSpannerFault" on Exception to make it easier to construct your retry policy as in:

    internal class CustomTransientErrorDetectionStrategy
        : ITransientErrorDetectionStrategy
    {
        public bool IsTransient(Exception ex) =>
            ex.IsTransientSpannerFault();
    }

Hope this helps!

EDIT: I just noticed that you are not waiting for the result of the async call to InsertTradesAsync. You may want to at least do a Task.WaitAll at the very end. Note that each batch of 100 writes is being run in the background possibly in parallel with previous batches. This is most likely what's increasing the odds of having a deadlock that forces a retry.

If you are doing this on purpose to increase performance, the Spanner ADO.NET library will open multiple connections in a pool behind the scenes equal to SpannerConnection.SpannerOptions.MaximumGrpcChannels. So you will see gains up to a point. You can increase this value to tune your performance.

regards