Some background information: I have a "question and answer" site with lots of posts and users. The data are stored in an MSSQL database. I am interesting in how a graph database will help producing interesting results to users, like suggest similar users or suggest relevant posts.
I managed to import all posts and users (only their ids) and the "post to post" and "post to user" relation from a test database into ArangoDB database and ended up with 4 collections:
- posts: Document collection, 9833 entries
- users: Document collection, 1264 entries
- parentToChildPosts: Edge collection storing relation from a parent post to a child post, 4978 entries
- postToUsers: Edge collection storing relation from a post to its owner user, 9833 entries.
Then I built a graph "postGraph" with the 2 edges and created an AQL query trying to implement a simple "similar users" algorithm based on the "user posts" and "post post" relations. Here is the query searching top 10 similar users for user with _id "users/1":
FOR e in GRAPH_NEIGHBORS('postGraph', 'users/1',
{ edgeCollectionRestriction: ['parentToChildPosts', 'postToUsers'],
vertexCollectionRestriction: 'users',
minDepth: 3,
maxDepth: 3 })
FILTER e.vertex._key != '1'
COLLECT userid = e.vertex._key INTO g
SORT LENGTH(g) DESC
LIMIT 10
RETURN userid
Visually, I started search from a user 'users/1', and find paths to other users following:
user => post => parent/child post => user.
So the depth of traversals is 3. Then I exclude the user himself (FILTER), group by user keys and count the number of results and grab the top 10.
I tested the query and it worked for most users but failed for some active users with the error:
[1909] too many iterations
The query runs quite fast (within one second) and gives the error. One of the failing active users has 727 posts (not so many in my opinion).
I also tried 2 other ways to implement the algorithm:
- Write 3 GRAPH_NEIGHBORS queries mimicking the 3 steps of one traversal: much slower. It takes around 3 seconds the first time running it.
- Use Join (without using any graph functions): quite fast.
From the documentation, ArangoDB suggests using graph functions for this kind of situation. So I am not in favor of using Join (gives me a feeling I can also do it in SQL, so why bother using graph database).
Any suggestion how to make the GRAPH_NEIGHBORS work quickly without any errors? Or any other suggestion how to build graph, like how large the graph can grow?