1
votes

I have 2 collections (edge collection) and both have similar type of data. Each have around 400k documents. But one of the collection occupies more than double of the disk space. I am wondering why it is like that. I do a lot of update/replace. Could it be because arangodb keeps all the revisions? 90% space is occupied by datafiles. If it's because of the revisions how can I disable persisting the revisions.

1
By looking at 'figures' I see that Dead documents size is huge. How can I disable saving dead documents?Deepak Agarwal
doCompact is set to false by the wayDeepak Agarwal
ArangoDB keeps revisions for MVCC, but the reason could also be the quadratic growth of datafiles. For example, if 512MB of allocated space are exceeded, the size is doubled to 1024MB, even if only 513MB are actually needed. One of your collections might be currently at 511MB, the other slightly over 512MB causing the latter to be twice the size.CodeManX
I have 469964 documents with 148 mb disk for Alive and 7805222 documents with 2.4 gb for dead documents. How can I disable the revisions. doCompact was set to false because we ran into a performance issue because of this where disk i/o was constantantly heavy.Deepak Agarwal
Setting doCompact to false will actually cause the old revisions of the documents to be kept forever in the datafiles. Setting it to true will start the cleanup process in that collection. The cleanup is performed by a background thread and will run interleaved with the other operations in that collection. It will potentially rewrite all the datafiles of the collection if most data are from dead documents. The end result should be much less disk usage if most of the documents in the datafiles are dead revisions.stj

1 Answers

3
votes

Usually the compactor thread is intended to clean up unused WAL-files. This was already done by ArangoDB. Thus the files using the space weren't shown in ls anymore.

The situation was caused by disabling the compactor thread to save system performance, thus many files were released at once.

However, for some reason arangod didn't close the file handles, which resulted in the file system not releasing the space of the deleted files.

This could be resolved by restarting the ArangoDB daemon - on shut down the held file handles were closed and the space released.

Meanwhile the issue of non-closing WAL-files has been fixed and is available as of ArangoDB 2.8.6