Using Spring Boot with Spanner in the Google Cloud Env. we are now struggling with performance issues. To demonstrate that I set up a small demo case baselining our different approaches how to retrieve data from spanner.
The first approach
uses "native" drivers from Google to instantiate a dbClient and retrieves data like so.
@Repository
public class SpannerNativeDAO implements CustomerDAO {
private final DatabaseClient dbClient;
private final String SQL = "select * from customer where customer_id = ";
public SpannerNativeDAO(
@Value("${spring.cloud.gcp.spanner.instanceId}") String instanceId,
@Value("${spring.cloud.gcp.spanner.database}") String dbId,
@Value("${spring.cloud.gcp.spanner.project-id}") String projectId,
@Value("${google.application.credentials}") String pathToCredentials)
throws IOException {
try (FileInputStream google_application_credentials = new FileInputStream(pathToCredentials)) {
final SpannerOptions spannerOptions =
SpannerOptions.newBuilder().setProjectId(projectId)
.setCredentials(ServiceAccountCredentials.fromStream(google_application_credentials)).build();
final Spanner spanner = spannerOptions.getService();
final DatabaseId databaseId1 = DatabaseId.of(projectId, instanceId, dbId);
dbClient = spanner.getDatabaseClient(databaseId1);
// give it a first shot to speed up consecutive calls
dbClient.singleUse().executeQuery(Statement.of("select 1 from customer"));
}
}
private Customer readCustomerFromSpanner(Long customerId) {
try {
Statement statement = Statement.of(SQL + customerId);
ResultSet resultSet = dbClient.singleUse().executeQuery(statement);
while (resultSet.next()) {
return Customer.builder()
.customerId(resultSet.getLong("customer_id"))
.customerStatus(CustomerStatus.valueOf(resultSet.getString("status")))
.updateTimestamp(Timestamp.from(Instant.now())).build();
}
} catch (Exception ex) {
//log
}
return null;
}
....
}
The second approach
uses the Spring Boot Data Starter (https://github.com/spring-cloud/spring-cloud-gcp/tree/master/spring-cloud-gcp-starters/spring-cloud-gcp-starter-data-spanner)
and simply goes like this
@Repository
public interface SpannerCustomerRepository extends SpannerRepository<Customer, Long> {
@Query("SELECT customer.customer_id, customer.status, customer.status_info, customer.update_timestamp "
+ "FROM customer customer WHERE customer.customer_id = @arg1")
List<Customer> findByCustomerId(@Param("arg1") Long customerId);
}
Now if i take the first approach, establishing a initial gRPC connection to Spanner takes > 5 seconds and all consecutive calls are around 1 sec. The second approach takes only approx. 400ms for each call after the initial call. To test differences I wired up both solutions in one Spring Boot Project and compared it to a in memory solution (~100ms). All given timings refer to local tests on dev machines but go back to investigating performance problems within the cloud environment.
I testet several different SpannerOptions (SessionOptions) with no results and ran a profiler on the project. I seems like 96% of response time comes from establishing a gRPC channel to spanner, whereas the database itself processes and responds within 5ms.
We really don't understand the behaviour. We only work with very little test-data and a couple of small tables.
- The DatabaseClient is supposed to manage the ConnectionPool and is itself wired into a Singleton-Scoped Repository-Bean. So Sessions should be reused, rigt?
- Why does the first approach take much longer than the second one. The Spring FW itself simply uses the DatabaseClient as member within the SpannerOperations / SpannerTemplate.
- How can we generally reduce latency. More than 200ms for plain response on each db call seems four times more than we would have expected. (I am aware that local timing benchmarks need to be treated with care)