I'm new to Neptune.
What is the best way to support multi-tenancy in the Neptune database?
The requirements:
1. Support thousands of tenants in the database (one cluster)
2. Avoid query getting too complicated with tenants filtering
3. Good performance (if there is a way to use the data partitioning for faster query time)
4. Secure - make it hard to make mistakes which will cause cross tenants access.
3 Answers
In a non-production environment the Gremlin partition strategy proved sufficient for me. The vertices and edges co-exist in the same Gremlin cluster, they have a property that differentiates them, in my case I used a _env
property.
Then in my Java code each time I request a traversal from my factory, it uses the partition strategy.
private GraphTraversalSource buildReadOnlyTraversal() {
log.debug("building read-only traversal");
return AnonymousTraversalSource.traversal()
.withRemote(DriverRemoteConnection.using(getReadOnlyCluster()))
.withStrategies(buildPartitionStrategy(), buildReadOnlyStrategy());
}
private PartitionStrategy buildPartitionStrategy() {
var env = this.properties.getEnvironmentPartition();
log.info("building partition strategy for environment={}", env);
return PartitionStrategy.build()
.partitionKey(ENVIRONMENT_PARTITION_KEY)
.writePartition(env)
.readPartitions(env)
.create();
}
Using these traversal's will automatically be scoped to your partition. However the big gotcha is that you'll need to remember to manually add references to the partition when querying from the console (well actually anything that isn't using the partition strategy mechanism) e.g.
g.V().hasLabel('user').has('_env', 'dev')
I think this meets the first 2 of your criteria, performance I can't really comment on. Point 4, yeah its not been a problem from application code, errors more likey when manually tinkering with the graph.
I had come up with workaround for this issue. I changed some existing functions of GraphTraversal like this.
setEnvironment = (g: GraphTraversalSource<GraphTraversal>, ENV: string) => {
const bindGraphTraversal = (t: GraphTraversal): GraphTraversal => {
const V = t.V.bind(t);
const addV = t.addV.bind(t);
const addE = t.addE.bind(t);
t.V = (...args: any[]) => V(...args).has('_env', ENV);
t.addV = (...args: any[]) => addV(...args).property('_env', ENV);
t.addE = (...args: any[]) => addE(...args).property('_env', ENV);
return t;
}
const addV = g.addV.bind(g);
const addE = g.addE.bind(g);
const V = g.V.bind(g);
const E = g.E.bind(g);
g.addV = (...args: any[]) => bindGraphTraversal(addV(...args).property('_env', ENV));
g.addE = (...args: any[]) => bindGraphTraversal(addE(...args).property('_env', ENV));
g.V = (...args: any[]) => bindGraphTraversal(V(...args).has('_env', ENV));
g.E = (...args: any[]) => bindGraphTraversal(E(...args).has('_env', ENV));
return g;
}
And then re-assigned the graph traversal object like this
g = setEnvironment(g, 'my-environment');
This will add property _env
automatically when creating Vertex or Edge. And it filter _env
from queries.
Though this is not a proper solution, we can create multiple graph environments by this.
From AWS Neptune documentation referring to migrate a multitentant Blazegraph database to Neptune:
"Multi-tenancy – Blazegraph supports multi-tenancy within a single database. In Neptune, multitenancy is supported either by storing data in named graphs and using the USING NAMED clauses for SPARQL queries, or by creating a separate database cluster for each tenant."
https://docs.amazonaws.cn/en_us/neptune/latest/userguide/neptune-ug.pdf
This might solve 1, 2 and 3. The 4th point would be the weak one since even by default (when no named graph is included in the query) it would use the union of all the graphs. Important to highlight this solutions works only on RDF and not in property graph storage.