0
votes

Prerequisites

I've created database with two generated collections: users and notes. Each contains ~1M documents.

Here are the structures:

user: (the name field uses skiplist index):

{
    "name": "Some user name"
}

note: (the authors field contains _keys to the documents from users collection):

{
    "title": "Some title",
    "authors": [
        "12345", "12346", "12347", ...
    ]
}

Problem

I need to join the users collection on authors field and then filter by user name but it takes too long. It's ~3.5s on my local. The Specific name value occurs only once.

let specificUsers = (
    for user in users
        filter user.name == 'Specific name'
        return user
)

for note in notes
    
    let authors = (
        for user in specificUsers
            filter user._key in (note.authors != null ? note.authors : [])
            return user
    )
   
    filter count(authors) > 0


//    filter 'Specific name' in (authors[*].name) // this way takes even longer

    limit 10

    return merge(note, {
        authors: authors
    })

If I omit the count filter or do filtering on "owned" attributes, it loads fast, of course. But the need is to actually do filtering on joined collection. Just like in relational databases.

Question

Am I doing something wrong or ArangoDB is not supposed to perform well in this case?

Please let me know if I need to provide more details.

1
From the top of my head, I would try adding an array index on the author attribute on the node collection. Search for indexing array values in the link below for how to create an index on the array values instead of the array itself. (arangodb.com/docs/3.4/…). Alternatively you could create an edge collection between the users and the notes and use a simple graph traversal to get the information you need (a plus in this option is that you do not need to filter on the count >0 since notes without authors will be naturally filtered out)camba1
@camba1, you're right. Thank you. Adding index on authors[*] does help. Another thing that affects performance is this: (note.authors != null ? note.authors : []). Even with index enabled.Sergey Solo

1 Answers

0
votes

So, two things I missed:

  • I didn't add index on authors[*].
  • I was using the (note.authors != null ? note.authors : []). (I guess, it's better to ensure the authors attribute is always array)