TL;DR
I have a Spring Boot application that makes use of parquet files stored on the file system. To access them we are using Apache Drill.
Since I have multiple users that might access them, I've set up a connection pool in Spring.
When I'm using the connection pool, Drill somehow executes a "limit 0" query before executing my actual query, and this affect performances. The same "limit 0" query is NOT executed when I run my queries through a simple Statement obtained from direct Connection.
This seems to be related to the fact that Spring JdbcTemplate makes use of PreparedStatements instead of simple Statements.
Is there a way to get rid of those "limit 0" queries?
-- Details --
The connection pool in the Spring configuration class looks like this:
@Bean
@ConfigurationProperties(prefix = "datasource.parquet")
@Qualifier("parquetDataSource")
public DataSource parquetDataSource() {
return DataSourceBuilder.create().build();
}
And the corresponding properties in the development profile YML file are:
datasource:
parquet:
url: jdbc:drill:drillbit=localhost:31010
jdbcUrl: jdbc:drill:drillbit=localhost:31010
jndiName: jdbc/app_parquet
driverClassName: org.apache.drill.jdbc.Driver
maximumPoolSize: 5
initialSize: 1
maxIdle: 10
maxActive: 20
validation-query: SELECT 1 FROM sys.version
test-on-borrow: true
When I execute a query using the JdbcTemplate created with the mentioned Drill DataSource, 3 different queries might be executed:
- the validation query
SELECT 1 FROM sys.version; - a "limit 0" query that looks like
SELECT * FROM (<my actual query>) LIMIT 0; - my actual query.
Here's the execution code (parquetJdbcTemplate is an instance of a class that extends org.springframework.jdbc.core.JdbcTemplate):
parquetJdbcTemplate.query(sqlQuery, namedParameters,
resultSet -> {
MyResultSet result = new MyResultSet();
while (resultSet.next()) {
// populate the "result" object
}
return result;
});
Here's a screenshot from the Profile page of my Drill monitor:

The bottom query is the "limit 0" one, then in the middle you have the validation query and on top (even if the query is not shown) the actual query that returns the data I want.
As you can see, the "limit 0" query takes more than 1/3 of the entire execution time to run. The validation query is fine, since the execution time is negligible and it's needed to check the connection.
The fact is, when I execute the same query using a Connection through the Drill driver (thus, with no pool), I only see my actual query in the UI monitor:
public void executeQuery(String myQuery) {
Class.forName("org.apache.drill.jdbc.Driver");
Driver.load();
Connection connection = DriverManager.getConnection("jdbc:drill:drillbit=localhost:31010");
Statement st = connection.createStatement();
ResultSet resultSet = st.executeQuery(myQuery);
while (resultSet.next()) {
// do stuff
}
}
As you can see, the total execution time improves by a lot (~14 seconds instead of ~26), just because the "limit 0" query is not executed.
As far as I know, those "limit 0" queries are executed to validate and get information about the underlying schema of the parquet files. Is there a way to disable them while using the connection pool? I ideally would like to still use PreparedStatements over simple Statements, but I could switch to simple Statements if needed, because I have full control over those queries (so, no SQL injection should be possible unless someone hacks the deployed artifacts).