Hello ArangoDB community,
I have imported two collections from sqlite to ArangoDB with arangoimport (via a CSV).
Next, I try to run a simple AQL to cross reference these collections (with an end goal to connect them via edges).
Collection1 has 1,682,642 documents
Collection2 has 3,290 documents
The following AQL takes a whopping 30 seconds to complete:
FOR c1 IN Collection1
FOR c2 IN Collection2
FILTER c2._key == TO_STRING(c1.someField) return {"C2": c2._id, "C1": c1._id}
If I switch the conversion like so, it takes forever (I abandoned after 5 minutes):
FOR c1 IN Collection1
FOR c2 IN Collection2
FILTER TO_NUMBER(c2._key) == c1.someField return {"C2": c2._id, "C1": c1._id}
Adding an index on "someField" didn't help.
The same JOIN query in Sqlite (from which the data was imported) takes less than 1 second to complete
A few thoughts and questions:
1) How can I know the data types of the fields in a document?
2) _key is a string. I think "someField" is a number (because without the TO_STRING, no results returned).
3) Is adding TO_STRING on "someField" effectively makes the index on the field unusable?
4) Is there a way to make _key a number (preferably an integer). I think number comparison is faster, is it not?
5) Alternatively, can I tell arangoimport to force "someField" to be a string?
6) Is there anything else I can do to make the AQL run faster?
Any input appreciated,
Elad