How can I read quickly data from a huge collection in ArangoDB using Java driver

Question

I am evaluating ArangoDB (version 3.2.4) as a replacement for MongoDB. We have a huge collection containing 2.700.000 documents. Next year this collection will increase (nearly 4.000.000 documents).

If I want to read data from that collection using the Java driver (version 4.2) it takes a lot of time for the cursor to fetch that data. The time depends on the size of fetched documents, which means, if I want to fetch all documents, it takes about 10 minutes for the cursor to fetch the data:

AQL:

for doc in myHugeCollection
    RETURN { "name": doc.name }

Java code:

    AqlQueryOptions aqlQueryOptions = new AqlQueryOptions();
    aqlQueryOptions.batchSize(500);
    aqlQueryOptions.count(false);
    aqlQueryOptions.cache(true);

    ArangoCursor<MyHugeCollection> arangoCursor = arangoDatabase.query(
            aqlQuery,
            new HashMap<>(),
            aqlQueryOptions,
            MyHugeCollection.class);

This will take about 10 minutes until I am able to access the data via the cursor. And because I set the batch size to 500 my expectation was a quick response, because fetching the first 500 results is extremely fast.

modified AQL fetching first 500 documents:

for doc in myHugeCollection
    limit 500
    RETURN { "name": doc.name }

This query will take about 20 ms.

So, my question is what am I doing wrong? How can I access data in a huge collection without waiting minutes for the cursor?

mpv89 mpv89 · Accepted Answer · 2017-10-17T06:48:26

It depends how you access your cursor.

When you convert it to List every document of the result is fetched.

List<MyHugeCollection> asList = arangoCursor.asListRemaining();

When you iterate over it with next() or forEachRemaining() (reguires Java 8) you can process the first 500 documents before the next batch is fetched from the database.

for (; arangoCursor.hasNext();) {
  MyHugeCollection doc = arangoCursor.next();
  // TODO
}

or

arangoCursor.forEachRemaining(doc -> {
  // TODO
});

How can I read quickly data from a huge collection in ArangoDB using Java driver

2 Answers