27
votes

I'm trying to setup a multi-tenant web application, with (ideally) possibility for both Database-separated and Schema-separated approach at the same time. Although I'm going to start with Schema separation. We're currently using:

  • Spring 4.0.0
  • Hibernate 4.2.8
  • Hibernate-c3p0 4.2.8 (which uses c3p0-0.9.2.1)
  • and PostgreSQL 9.3 (which I doubt it really matters for the overall architecture)

Mostly I followed this thread (because of the solution for @Transactional). But I'm kinda lost in implementing MultiTenantContextConnectionProvider. There is also this similar question asked here on SO, but there are some aspects that I can't figure out:

1) What happens to Connection Pooling? I mean, is it managed by Spring or Hibernate? I guess with ConnectionProviderBuilder - or as suggested - any of its implementation, Hibernate is the guy who manages it.
2) Is it a good approach that Spring does not manage Connection Pooling? or Is it even possible that Spring does manage it?
3) Is this the right path for future implementing of both Database and Schema separation?

Any comments or descriptions are totally appreciated.

application-context.xml

<beans>
    ...
    <bean id="dataSource" class="org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy">
        <property name="targetDataSource" ref="c3p0DataSource" />
    </bean>

    <bean id="c3p0DataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource" destroy-method="close">
        <property name="driverClass" value="org.postgresql.Driver" />
        ... other C3P0 related config
    </bean>

    <bean id="sessionFactory" class="org.springframework.orm.hibernate4.LocalSessionFactoryBean">
        <property name="packagesToScan" value="com.webapp.domain.model" />

        <property name="hibernateProperties">
            <props>
                <prop key="hibernate.dialect">org.hibernate.dialect.PostgreSQLDialect</prop>
                <prop key="hibernate.default_schema">public</prop>

                <prop key="hibernate.multiTenancy">SCHEMA</prop>
                <prop key="hibernate.tenant_identifier_resolver">com.webapp.persistence.utility.CurrentTenantContextIdentifierResolver</prop>
                <prop key="hibernate.multi_tenant_connection_provider">com.webapp.persistence.utility.MultiTenantContextConnectionProvider</prop>
            </props>
        </property>
    </bean>

    <bean id="transactionManager" class="org.springframework.orm.hibernate4.HibernateTransactionManager">
        <property name="autodetectDataSource" value="false" />
        <property name="sessionFactory" ref="sessionFactory" />
    </bean>

   ...
</beans>

CurrentTenantContextIdentifierResolver.java

public class CurrentTenantContextIdentifierResolver implements CurrentTenantIdentifierResolver {
    @Override
    public String resolveCurrentTenantIdentifier() {
        return CurrentTenantIdentifier;  // e.g.: public, tid130, tid456, ...
    }

    @Override
    public boolean validateExistingCurrentSessions() {
        return true;
    }
}

MultiTenantContextConnectionProvider.java

public class MultiTenantContextConnectionProvider extends AbstractMultiTenantConnectionProvider {
    // Do I need this and its configuratrion?
    //private C3P0ConnectionProvider connectionProvider = null;

    @Override
    public ConnectionProvider getAnyConnectionProvider() {
        // the main question is here.
    }

    @Override
    public ConnectionProvider selectConnectionProvider(String tenantIdentifier) {
        // and of course here.
    }
}



Edit

Regarding the answer of @ben75:

This is a new implementation of MultiTenantContextConnectionProvider. It no longer extends AbstractMultiTenantConnectionProvider. It rather implements MultiTenantConnectionProvider, to be able to return [Connection][4] instead of [ConnectionProvider][5]

public class MultiTenantContextConnectionProvider implements MultiTenantConnectionProvider, ServiceRegistryAwareService {
    private DataSource lazyDatasource;;

    @Override
    public void injectServices(ServiceRegistryImplementor serviceRegistry) {
        Map lSettings = serviceRegistry.getService(ConfigurationService.class).getSettings();

        lazyDatasource = (DataSource) lSettings.get( Environment.DATASOURCE );
    }

    @Override
    public Connection getAnyConnection() throws SQLException {
        return lazyDatasource.getConnection();
    }

    @Override
    public Connection getConnection(String tenantIdentifier) throws SQLException {
        final Connection connection = getAnyConnection();

        try {
            connection.createStatement().execute("SET SCHEMA '" + tenantIdentifier + "'");
        }
        catch (SQLException e) {
            throw new HibernateException("Could not alter JDBC connection to specified schema [" + tenantIdentifier + "]", e);
        }

        return connection;
    }
}
2
It would be useful to know why you want to do this and/or what are your requirements/concerns (ie compliance, performance, selling point) . With the "cloud" these days many people just solve multi-tenancy by booting up separate images and thus have true separation which is often required for compliance reason and thus you do not need multitenancy at the DataSource level. The other option is just to have one massive schema that is multi-tenant aware and partition/shard your application based on performance reasons.Adam Gent
@AdamGent The main reason I'm not going to use separate instances for each customer is that we're targeting for 10K-50K+ customer. To best of my knowledge having that much "separate instances" would cost way more than split them via load-balanced, multi-tenant cluster of single-instance application. For the second part of your comment, for the sake of flexibility, we're going to provide alternatives approach for data layer. If a customer wants a separate database, and they are willing to pay extra charge/month for it, so be it.Khosrow

2 Answers

40
votes

You can choose between 3 different strategies that will impact connection polling. In any case you have to provide an implementation of MultiTenantConnectionProvider. The strategy you choose will of course impact your implementation.

General remark about MultiTenantConnectionProvider.getAnyConnection()

getAnyConnection() is required by hibernate to collect metadata and setup the SessionFactory. Usually in a multi-tenant architecture you have a special/master database (or schema) not used by any tenant. It's a kind of template database (or schema). It's ok if this method returns a connection to this database (or schema).

Strategy 1 : each tenant have it's own database. (and so it's own connection pool)

In this case, each tenant have it's own connection pool managed by C3PO and you can provide an implementation of MultiTenantConnectionProvider based on AbstractMultiTenantConnectionProvider

Every tenant have it's own C3P0ConnectionProvider, so all you have to do in selectConnectionProvider(tenantIdentifier) is to return the correct one. You can keep a Map to cache them and you can lazy-initialize a C3POConnectionProvider with something like :

private ConnectionProvider lazyInit(String tenantIdentifier){
    C3P0ConnectionProvider connectionProvider = new C3P0ConnectionProvider();
    connectionProvider.configure(getC3POProperties(tenantIdentifier));
    return connectionProvider;
}

private Map getC3POProperties(String tenantIdentifier){
    // here you have to get the default hibernate and c3po config properties 
    // from a file or from Spring application context (there are good chances
    // that those default  properties point to the special/master database) 
    // and alter them so that the datasource point to the tenant database
    // i.e. : change the property hibernate.connection.url 
    // (and any other tenant specific property in your architecture like :
    //     hibernate.connection.username=tenantIdentifier
    //     hibernate.connection.password=...
    //     ...) 
}

Strategy 2 : each tenant have it's own schema and it's own connection pool in a single database

This case is very similar to the first strategy regarding ConnectionProvider implementation since you can also use AbstractMultiTenantConnectionProvider as base class to implement your MultiTenantConnectionProvider

The implementation is very similar to the suggested implementation for Strategy 1 except that you must alter the schema instead of the database in the c3po configuration

Strategy 3 : each tenant have it's own schema in a single database but use a shared connection pool

This case is slightly different since every tenant will use the same connection provider (and so the connection pool will be shared). In the case : the connection provider must set the schema to use prior to any usage of the connection. i.e. You must implement MultiTenantConnectionProvider.getConnection(String tenantIdentifier) (i.e. the default implementation provided by AbstractMultiTenantConnectionProvider won't work).

With postgresql you can do it with :

 SET search_path to <schema_name_for_tenant>;

or using the alias

 SET schema <schema_name_for_tenant>;

So here is what your getConnection(tenant_identifier); will look like:

@Override
public Connection getConnection(String tenantIdentifier) throws SQLException {
    final Connection connection = getAnyConnection();
    try {
        connection.createStatement().execute( "SET search_path TO " + tenanantIdentifier );
    }
    catch ( SQLException e ) {
        throw new HibernateException(
                "Could not alter JDBC connection to specified schema [" +
                        tenantIdentifier + "]",
                e
        );
    }
    return connection;
}

Useful reference is here (official doc)

Other useful link C3POConnectionProvider.java


You can combine strategy 1 and strategy 2 in your implementation. You just need a way to find the correct connection properties/connection url for the current tenant.


EDIT

I think that the choice between strategy 2 or 3 depends on the traffic and the number of tenants on your app. With separate connection pools : the amount of connections available for one tenant will be much lower and so: if for some legitime reason one tenant need suddenly many connections the performance seen by this particular tenant will drastically decrease (while the other tenant won't be impacted).

On the other hand, with strategy 3, if for some legitime reason one tenant need suddenly many connections: the performance seen by every tenant will decrease.

In general , I think that strategy 2 is more flexible and safe : every tenant cannot consume more than a given amount of connection (and this amount can be configured per tenant if you need it)

0
votes

IMHO, the connection pool management will be default handled by the Sql Server itself, however some programming languages like C# do offer some ways to control the pools. Refer here

The choice of (1) schema or (2) separate database for a tenant depends upon the volume of the data that you can anticipate for the tenant. However, the following consideration can be worth looking into

  1. create a shared schema model for the trial customers and the low volume customers, this can be identified by the number of the features that you provide to a tenant during the process of onboarding a customer

  2. when you create or onboard a enterprise level customer that may have a large transactional data, it is ideal to go for a separate database.

  3. The schema model may have a different implementation for SQL Server and a different one for the MySQL Server, which you should consider.

  4. also when choosing for the option, do consider the fact that a customer [tenant] may be willing to scale out after a considerable amount of time and system usage. If there is no appropriate scale out option supported in your app, you will have to be bothered.

Share your comments on the above points, to take this discussion further