1
votes

Using Spring Boot with Spanner in the Google Cloud Env. we are now struggling with performance issues. To demonstrate that I set up a small demo case baselining our different approaches how to retrieve data from spanner.

The first approach

uses "native" drivers from Google to instantiate a dbClient and retrieves data like so.

@Repository
public class SpannerNativeDAO implements CustomerDAO {

  private final DatabaseClient dbClient;
  private final String SQL = "select * from customer where customer_id = ";

  public SpannerNativeDAO(
      @Value("${spring.cloud.gcp.spanner.instanceId}") String instanceId,
      @Value("${spring.cloud.gcp.spanner.database}") String dbId,
      @Value("${spring.cloud.gcp.spanner.project-id}") String projectId,
      @Value("${google.application.credentials}") String pathToCredentials)
      throws IOException {
    try (FileInputStream google_application_credentials = new FileInputStream(pathToCredentials)) {
      final SpannerOptions spannerOptions =
          SpannerOptions.newBuilder().setProjectId(projectId)
              .setCredentials(ServiceAccountCredentials.fromStream(google_application_credentials)).build();
      final Spanner spanner = spannerOptions.getService();
      final DatabaseId databaseId1 = DatabaseId.of(projectId, instanceId, dbId);
      dbClient = spanner.getDatabaseClient(databaseId1);
      // give it a first shot to speed up consecutive calls
      dbClient.singleUse().executeQuery(Statement.of("select 1 from customer"));
    }
  }

  private Customer readCustomerFromSpanner(Long customerId) {
    try {
      Statement statement = Statement.of(SQL + customerId);
      ResultSet resultSet = dbClient.singleUse().executeQuery(statement);
      while (resultSet.next()) {
        return Customer.builder()
            .customerId(resultSet.getLong("customer_id"))
            .customerStatus(CustomerStatus.valueOf(resultSet.getString("status")))
            .updateTimestamp(Timestamp.from(Instant.now())).build();
      }
    } catch (Exception ex) {
      //log
    }
    return null;
  }


....

}

The second approach

uses the Spring Boot Data Starter (https://github.com/spring-cloud/spring-cloud-gcp/tree/master/spring-cloud-gcp-starters/spring-cloud-gcp-starter-data-spanner)

and simply goes like this

@Repository
public interface SpannerCustomerRepository extends SpannerRepository<Customer, Long> {

  @Query("SELECT customer.customer_id, customer.status, customer.status_info, customer.update_timestamp "
      + "FROM customer customer WHERE customer.customer_id = @arg1")
  List<Customer> findByCustomerId(@Param("arg1") Long customerId);
}

Now if i take the first approach, establishing a initial gRPC connection to Spanner takes > 5 seconds and all consecutive calls are around 1 sec. The second approach takes only approx. 400ms for each call after the initial call. To test differences I wired up both solutions in one Spring Boot Project and compared it to a in memory solution (~100ms). All given timings refer to local tests on dev machines but go back to investigating performance problems within the cloud environment.

I testet several different SpannerOptions (SessionOptions) with no results and ran a profiler on the project. I seems like 96% of response time comes from establishing a gRPC channel to spanner, whereas the database itself processes and responds within 5ms.

We really don't understand the behaviour. We only work with very little test-data and a couple of small tables.

  • The DatabaseClient is supposed to manage the ConnectionPool and is itself wired into a Singleton-Scoped Repository-Bean. So Sessions should be reused, rigt?
  • Why does the first approach take much longer than the second one. The Spring FW itself simply uses the DatabaseClient as member within the SpannerOperations / SpannerTemplate.
  • How can we generally reduce latency. More than 200ms for plain response on each db call seems four times more than we would have expected. (I am aware that local timing benchmarks need to be treated with care)
3
How is the gRPC connection latency measured?rkansola
I am running a bash script > throwing curl statements in a loop and measure the average response timings, from receiving the call until sending response.HannesB

3 Answers

3
votes

Tracing give us good visibility into the client, hopefully it can help you with diagnosing the latencies.

Running TracingSample, I get stackdriver trace from stackdriver. There are different backends you can use, or print it out as logs.

The sample above also exports http://localhost:8080/rpcz and http://localhost:8080/tracez you can poke around to check latencies and traces.

A tutorial on setting it up: Cloud Spanner, instrumented by OpenCensus and exported to Stackdriver

1
votes

The problem here is not related to Spring or DAO's, but that you are not closing the ResultSet that is returned by the query. This causes the Spanner library to think that the session that is used to execute your query is still in use, and causes the library to create a new session every time you execute a query. This session creation, handling and pooling is all taken care of for you by the client library, but it does require you to close resources when they are no longer being used.

I tested this with very simple example, and I can reproduce the exact same behavior as what you are seeing by not closing the ResultSet.

Consider the following example:

/**
 * This method will execute the query quickly, as the ResultSet
 * is closed automatically by the try-with-resources block.
 */
private Long executeQueryFast() {
  Statement statement = Statement.of("SELECT * FROM T WHERE ID=1");
  try (ResultSet resultSet = dbClient.singleUse().executeQuery(statement)) {
    while (resultSet.next()) {
      return resultSet.getLong("ID");
    }
  } catch (Exception ex) {
    // log
  }
  return null;
}

/**
 * This method will execute the query slowly, as the ResultSet is
 * not closed and the Spanner library thinks that the session is
 * still in use. Executing this method repeatedly will cause
 * the library to create a new session for each method call.
 * Closing the ResultSet will cause the session that was used
 * to be returned to the session pool, and the sessions will be
 * re-used.
 */
private Long executeQuerySlow() {
  Statement statement = Statement.of("SELECT * FROM T WHERE ID=1");
  try {
    ResultSet resultSet = dbClient.singleUse().executeQuery(statement);
    while (resultSet.next()) {
      return resultSet.getLong("ID");
    }
  } catch (Exception ex) {
    // log
  }
  return null;
}

You should always place ResultSets (and all other AutoCloseables) in a try-with-resources block whenever possible.

Note that if you consume a ResultSet that is returned by Spanner completely, i.e. you call ResultSet#next() until it returns false, the ResultSet is also implicitly closed and the session is returned to the pool. I would however recommend not to rely solely on that, but to always wrap a ResultSet in a try-with-resources.

0
votes

Can you confirm that the performance doesn't change if the SQL strings are made the same between the two methods? (* vs spelling them out individually).

Also, since you're expecting a single customer in the first method, I'm inferring that the customer ID is a key column? If so, you can use the read-by-key methods from SpannerRepository, and that might be faster than a SQL query.