I experienced an outage with my application the other day and I need to understand how to avoid this in the future.
We have a Java based web application running on Tomcat 7. The application connected to several different data sources including an Oracle database.
Here are the details, the Oracle database server went down and had to be rebooted. My simple understanding tells me this would have severed the application's connections to the database, and in fact the users reported errors in the application.
The Oracle data source is setup in Tomcat's sever.xml as a GlobalNaming Resource:
<Resource name="datasource"
auth="Container"
type="javax.sql.DataSource"
factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
....
initialSize="4"
minIdle="2"
maxIdle="8"
maxActive="8"
maxAge="28800000"
maxWait="30000"
testOnBorrow="false"
testOnReturn="false"
testWhileIdle="false"
validationQuery="SELECT 1 FROM dual"
validationQueryTimeout="10"
validationInterval="600000"
timeBetweenEvictionRunsMillis="60000"
minEvictableIdleTimeMillis="900000"
removeAbandoned="true"
removeAbandonedTimeout="60"
logAbandoned="true"
jmxEnabled="true" />
So here is what I understand regarding connection validation.
- Connections are not validated while idle (testWhileIdle = false), when borrowed (testOnBorrow = false), when returned (testOnReturn = false)
- The PoolSweeper is enabled because timeBetweenEvictionRunsMillis > 0, removeAbandoned is true, and removeAbandonedTimeout > 0
What confuses me is the inclusion of the validation query and the validationInterval > 0. Since all of the tests are disabled, does the pool sweeper then use the validation query to check the connections? Or is the validation query irrelevant?
So when the database server went down, I believe the connection pool would not have tried to reestablish connections because there are no validation tests enabled. In my opinion, had testOnBorrow been enabled then when the database server came back up valid connections would have been established and the web application (meaning tomcat) would not have required a restart.
Do I have a correct understanding of how connection validation works?