Cassandra: customer data per keyspace

Question

Problem: one of our new customers want the data to be stored in his own country (law regulations). However we use existing customer's data spread across few datacenters in different countries.

Question: how we can separate new customer's data to reside in its own country without much changing existing Cassandra architecture?

Potential Solution #1: to use separate keyspace for this customer. Schemas will be the same between keyspaces what adds the complexity for data migration and so on. DataStax support confirmed that it is possible to configure keyspace per region. However Spring Data Cassandra we use, doesn't allow to choose keyspace dynamically. The only way is to use CqlTemplate and to run use keyspace blabla everytime before the call or to add keyspace before the table select * from blabla.mytable but it sounds as a hack for me.

Potential Solution #2 to use separate environment for new client but management rejects to do it.

Any other ways to achieve this goal?

I don't see how you can achieve it without creating a new keyspace if all customers share the same keyspace and its data is spread in data centers in multiple countries. The keyspace is where you would specify to only put the data in specific data centers (the ones in the customer country). — Edu
@Edu, yes, we are thinking in the same way (potential solution #1) but with Spring Data Cassandra it is not possible to use dynamic switch of keyspace (at least my few hours research didn't help). — walv
@walv: why do you say tat adding "keyspace before the table select * from blabla.mytable" sounds like a hack? It's a normal way to reference a table and is pretty used. It's just like a fully qualified name of the table. — Horia
@walv I don't know the specifics of your use case, but it seems like you will always need to know the customer region by request and define the keyspace on a "by request lifestyle", using it in the cassandra queries that are made by the customer in that request. — Edu
@Edu, yes, each http request will contain companyId of the user, so we can easily map what company belongs to what cluster. — walv

Oresztesz Oresztesz · Accepted Answer · 2017-11-06T14:15:04

Update 3

Example and explanation below is same as in GitHub

Update 2

The example in GitHub is now working. The most future proof solution seemed to be using repository extensions. Will update the example below soon.

Update

Notice that the solution I originally posted had some flaws that I discovered during JMeter tests. The Datastax Java driver reference advises to avoid setting keyspace through Session object. You have to set keyspace explicitly in every query.

I've updated the GitHub repository and also changed solution's description.

Be very careful though: if the session is shared by multiple threads, switching the keyspace at runtime could easily cause unexpected query failures.

Generally, the recommended approach is to use a single session with no keyspace, and prefix all your queries.

Solution Description

I would set-up a separate keyspace for this specific customer and provide support for changing keyspace in the application. We used this approach previously with RDBMS and JPA in production. So, I would say it can work with Cassandra as well. Solution was similar as below.

I will describe briefly how to prepare and set-up Spring Data Cassandra to configure target keyspace on each request.

Step 1: Preparing your services

I would define first how to set the tenant ID on each request. A good example would be in-case-of REST API is to use a specific HTTP header that defines it:

Tenant-Id: ACME

Similarly on every remote protocol you can forward tenant ID on every message. Let's say if you're using AMQP or JMS, you can forward this inside message header or properties.

Step 2: Getting tenant ID in application

Next, you should store the incoming header on each request inside your controllers. You can use ThreadLocal or you can try using a request-scoped bean.

@Component
@Scope(scopeName = "request", proxyMode= ScopedProxyMode.TARGET_CLASS)
public class TenantId {

    private String tenantId;

    public void set(String id) {
        this.tenantId = id;
    }

    public String get() {
        return tenantId;
    }
}

@RestController
public class UserController {

    @Autowired
    private UserRepository userRepo;
    @Autowired
    private TenantId tenantId;

    @RequestMapping(value = "/userByName")
    public ResponseEntity<String> getUserByUsername(
            @RequestHeader("Tenant-ID") String tenantId,
            @RequestParam String username) {
        // Setting the tenant ID
        this.tenantId.set(tenantId);
        // Finding user
        User user = userRepo.findOne(username);
        return new ResponseEntity<>(user.getUsername(), HttpStatus.OK);
    }
}

Step 3: Setting tenant ID in data-access layer

Finally you should extend Repository implementations and set-up keyspace according to the tenant ID

public class KeyspaceAwareCassandraRepository<T, ID extends Serializable>
        extends SimpleCassandraRepository<T, ID>  {

    private final CassandraEntityInformation<T, ID> metadata;
    private final CassandraOperations operations;

    @Autowired
    private TenantId tenantId;

    public KeyspaceAwareCassandraRepository(
            CassandraEntityInformation<T, ID> metadata,
            CassandraOperations operations) {
        super(metadata, operations);
        this.metadata = metadata;
        this.operations = operations;
    }

    private void injectDependencies() {
        SpringBeanAutowiringSupport
                .processInjectionBasedOnServletContext(this,
                getServletContext());
    }

    private ServletContext getServletContext() {
        return ((ServletRequestAttributes) RequestContextHolder.getRequestAttributes())
                .getRequest().getServletContext();
    }

    @Override
    public T findOne(ID id) {
        injectDependencies();
        CqlIdentifier primaryKey = operations.getConverter()
                .getMappingContext()
                .getPersistentEntity(metadata.getJavaType())
                .getIdProperty().getColumnName();

        Select select = QueryBuilder.select().all()
                .from(tenantId.get(),
                        metadata.getTableName().toCql())
                .where(QueryBuilder.eq(primaryKey.toString(), id))
                .limit(1);

        return operations.selectOne(select, metadata.getJavaType());
    }

    // All other overrides should be similar
}

@SpringBootApplication
@EnableCassandraRepositories(repositoryBaseClass = KeyspaceAwareCassandraRepository.class)
public class DemoApplication {
...
}

Let me know if there are any issues with the code above.

Sample code in GitHub

https://github.com/gitaroktato/spring-boot-cassandra-multitenant-example