I am creating a Node.js application that works with ArangoDB as data store. Basically, what I have as data structure is two tables, one for managing so-called instances
, another one for entities
. What I do is the following:
- There is a document in the
instances
collection for everyinstance
I have. - Whenever I add an entity to the
entities
collection, I also want to keep track of the entities that belong to a specific instance. - So, every
instance
document has an array field forentities
, and I push the ID of the entity into that array.
The following code shows the general outline:
// Connect to ArangoDB.
db = new Database(...);
db.useBasicAuth(user, password);
// Use the database.
await db.createDatabase(database);
db.useDatabase(database);
// Create the instance collection.
instanceCollection = db.collection(`instances-${uuid()}`);
await instanceCollection.create();
// Create the entities collection.
entityCollection = db.collection(`entities-${uuid()}`);
await entityCollection.create();
// Setup an instance.
instance = {
id: uuid(),
entities: []
};
// Create the instance in the database.
await db.query(aql`
INSERT ${instance} INTO ${instanceCollection}
`);
// Add lots of entities.
for (let i = 0; i < scale; i++) {
// Setup an entity.
const entity = {
id: uuid()
};
// Update the instance.
instance.entities.push(entity);
// Insert the entity in the database.
await db.query(aql`
INSERT ${entity} INTO ${entityCollection}
`);
// Update the instance in the database.
await db.query(aql`
FOR i IN ${instanceCollection}
FILTER i.id == ${instance.id}
UPDATE i WITH ${instance} IN ${instanceCollection} OPTIONS { mergeObjects: false }
`);
}
The problem now is that this becomes extremely slow the more entities I add. It basically has exponential growth, although I would have expected linear growth:
Running benchmark 'add and update'
100 Entities: 348.80ms [+0.00%]
1000 Entities: 3113.55ms [-10.74%]
10000 Entities: 90180.18ms [+158.54%]
Adding an index has an effect, but does not change anything on the overall problem:
Running benchmark 'add and update with index'
100 Entities: 194.30ms [+0.00%]
1000 Entities: 2090.15ms [+7.57%]
10000 Entities: 89673.52ms [+361.52%]
The problem can be tracked down to the UPDATE
statement. If you leave it out and only use the database's INSERT
statement, things scale linearly. So, something seems to be wrong with the update itself. However, I don't understand where the problem is.
This is what I would like to understand: Why does the UPDATE
statement get dramatically slower over time? Am I using it wrong? Is this a known problem in ArangoDB? …?
What I am not interested in is discussing this approach: Please take is as given. Let's focus on the performance of the UPDATE
statement. Any ideas?
UPDATE
As asked for in the comments, here some information on the system setup:
- ArangoDB 3.4.6, 3.6.2.1, and 3.7.0-alpha.2 (all running in Docker, on macOS and Linux)
- Single-server setup
- ArangoJS 6.14.0 (we also had this with earlier versions, although I can't tell the exact version)
use the database's ADD statement
? There is no ADD operation or function to my best knowledge. Do you mean something like this?LET scale = 10000 RETURN (FOR i IN 1..scale RETURN { id: i })
– CodeManXINSERT
instead ofADD
. I fixed it. – Golo Roden