2
votes

What is the simplest way to quickly create edges in ArangoDB programmatically?

I would like to create relationships between documents based on a common attribute. I'd like to be able to select an attribute, and for every document in collection A, create an edge to every document in collection B that has the same value in an equivalent attribute.

For example, if I've imported email messages into a collection and people into another collection, I would like to generate edges between the emails and collections. An email's schema might look like this:

{
  "_key":
  "subject":
  "body":
  "from":
  "to":
}

And a person's schema might look like this:

{
  "_key":
  "name":
  "email":
}

Let's say that the values in the from and to fields in the email messages correspond to email addresses that we may find in the people collection.

I'd like to be able to take the collections, attributes, and edge parameters as input, then, for every document in the people collection, create an edge to every document in the email collection that has the same email address in the from attribute as the current document's email attribute.

So far, I think that Foxx may be the best tool for this, but I am a bit overwhelmed by the documentation.

Eventually, I'd like to create a full CRUD based on shared attributes between documents defining edges, including an "upsert" equivalent- updating an edge if it already exists and creating it if it doesn't.

I know that doing this with individual API calls with the standard HTTP API would be far too slow, since I would need to query Arango for every document in a collection and return very large numbers of results.

Is there already a Foxx service that does this? If not, where should I start to create one?

1

1 Answers

3
votes

A single AQL query should suffice:

FOR p IN people
    FOR e IN emails
        FILTER p.email == e.from
        INSERT {_from: p._id, _to: e._id} INTO sent

The email addresses in the vertex collection people are matched with the from email addresses of the emails vertex collection. For every match, a new edge is inserted into an edge collection sent, linking people and email records.

If both vertex collections contain a small number of documents, it is okay to execute this query without indexes (e.g. 1,000 persons and 3,000 emails took about 2 seconds in my test). For larger datasets, make sure to create a hash index in people on the email attribute, and in emails a hash index on from. It reduced the execution time to about 30ms in my test.