1
votes

The problem

I iterate over an entire vertex collection, e.g. journals, and use it to create edges, author, from a person to the given journal.

I use python-arango and the code is something like:

for journal in journals.all():
    create_author_edge(journal)

I have a relatively small dataset, and the journals-collection has only ca. 1300 documents. However: this is more than 1000, which is the batch size in the Web Interface - but I don't know if this is of relevance.

The problem is that it raises a CursorNextError, and returns HTTP 404 and ERR 1600 from the database, which is the ERROR_CURSOR_NOT_FOUND error:

Will be raised when a cursor is requested via its id but a cursor with that id cannot be found.

Insights to the cause

From ArangoDB Cursor Timeout, and from this issue, I suspect that it's because the cursor's TTL has expired in the database, and in the python stacktrace something like this is seen:

# Part of the stacktrace in the error:
(...)
if not cursor.has_more():
    raise StopIteration
cursor.fetch()  <---- error raised here
(...)

If I iterate over the entire collection fast, i.e. if I do print(len(journals.all()) it outputs "1361" with no errors.

When I replace the journals.all() with AQL, and increase the TTL parameter, it works without errors:

for journal in db.aql.execute("FOR j IN journals RETURN j", ttl=3600):
    create_author_edge(journal)

However, without the the ttl-parameter, the AQL approach gives the same error as using journals.all().

More information

A last piece of information is that I'm running this on my personal laptop when the error is raised. On my work computer, the same code was used to create the graph and populate it with the same data, but there no errors were raised. Because I'm on holiday I don't have access to my work computer to compare versions, but both systems were installed during the summer so there's a big chance the versions are the same.

The question

I don't know if this is an issue with python-arango, or with ArangoDB. I believe that because there is no problem when TTL is increased that it could indicate an issue with ArangodDB and not the Python driver, but I cannot know.

(I've added a feature request to add ttl-param to the .all()-method here.)

Any insights into why this is happening?


I don't have the rep to create the tag "python-arango", so it would be great if someone would create it and tag my question.

2

2 Answers

1
votes

Inside of the server the simple queries will be translated to all(). As discussed on the referenced github issue, simple queries don't support the TTL parameter, and won't get them.

The prefered solution here is to use an AQL-Query on the client, so that you can specify the TTL parameter.

In general you should refrain from pulling all documents from the database at once, since this may introduce other scaling issues. You should use proper AQL with FILTER statements backed by indices (use explain() to revalidate) to fetch the documents you require.

If you need to iterate over all documents in the database, use paging. This is usually implemented the best way by combining a range FILTER with a LIMIT clause:

FOR x IN docs
  FILTER x.offsetteableAttribute > @lastDocumentWithThisID
  LIMIT 200
    RETURN x
0
votes

So here is how I did it. You can specify with the more args param makes it easy to do.

Looking at the source you can see the doc string says what to do

def AQLQuery(self, query, batchSize = 100, rawResults = False, bindVars = None, options = None, count = False, fullCount = False,
             json_encoder = None, **moreArgs):
    """Set rawResults = True if you want the query to return dictionnaries instead of Document objects.
    You can use **moreArgs to pass more arguments supported by the api, such as ttl=60 (time to live)"""
from pyArango.connection import *
conn = Connection(username=usr, password=pwd,arangoURL=url)# set this how ya need
db = conn['collectionName']#set this to the name of your collection
aql = """ for journal in journals.all():
    create_author_edge(journal)"""
doc = db.AQLQuery(aql,ttl=300)


Thats all ya need to do!