0
votes

When I load a collection into memory for the first time, it is all in the memory (i can see it in the task manager), but over time I can see that only part of the original size is taken by arangod process. Besides, when I execute a query, retrieving data from that collection, I can see that disk usage is growing for a short period of time and the size of used RAM is growing aswell.

I'd like to avoid it. How can I do it? I see that collections have the property isVolatile

isVolatile: If true then the collection data will be kept in memory only and ArangoDB will not write or sync the data to disk.

it is almost what I want but

Unloading the collection will cause the collection data to be discarded. Stopping or re-starting the server will also cause full loss of data in the collection

Can I somehow keep the whole collection in memory but without losing data after unloading?

1

1 Answers

1
votes

The only way to guarantee that your collections are in RAM is to use the MMFiles engine. With RocksDB there is no guarantee. Two full collection scans should also lead to RocksDB collections to be loaded to RAM. But when you deplete your memory, some data is unloaded again.

Just because memory figures go back it is not an indication of collection data being unloaded. Here's the Wikipedia article on MMFs: https://en.wikipedia.org/wiki/Memory-mapped_file. So as long as your collection is loaded, which happens immediately when you access it's data or specifically call the load method, it is residing in RAM.

Regarding your question about data loss: you have 2 different strategies for syncing data to disk, which you can choose from: wait-for-sync true or false. This parameter can be set at startup - then affecting all databases and all collections - or on a per collection basis when you initially create them. As the name says it refers to the point at which a data point is considered committed and reported as such to the client. For high performance and less safety the value could be set false. Under this regime one may lose a couple of seconds of data, should power to the machine or disks suddenly fail.

TLDR use MMFiles and your loaded collections live in RAM, as long as you have memory left. Beyond that point you end up in swap space with horrendous consequences for performance.