0
votes

I have 2 CSV files which are an export of relation DB.
CSV1 has unique IDs,
CSV2 doesn't but has a column linking to the CSV1 objects.
I import CSV1 mapping the unique IDs to _key.
I'd like to import CSV2 to another collection and link it via an edge to the objects in the first collection.
What's the easiest way to do it?

P.S.
(I know in Neo4j, such a thing is trivial using the import tool and was wondering if there exist such functionality in ArangoDB, or I'll have to write some AQLs to do it).

Sincerely, Elad

1

1 Answers

1
votes

While there is no wizard to import data, importing data into ArangoDB is also trivial assuming you are comfortable with the command line (which since you are in this site, I bet you are):

  1. use Arango import tool to import your CSV files into two collections
  2. Create your edge collection
  3. use a simple AQL query to insert data into the edge collection

Here is a sample syntax to import csv with arangoimp:

arangoimp --file <path/filename> --collection <collectionName> --create-collection true --type csv --server.database <databaseName> —server.username <username>

And here are some common options:

Translating column names:

arangoimport --file "data.csv" --type csv --translate "from=_from" --translate "to=_to"

Ignore empty values (instead of throwing warnings and not loading data), use the flag:

--ignore-missing

ignore column in the import file:

arangoimport --file "data.csv" --type csv --remove-attribute “attributeName”

Additionally, if you have the edge collection in a csv file already you can also import that directly:

arangoimp --file <path/filename> --collection <collectionName> --create-collection true --type csv --create-collection-type edge --server.database <databaseName>

Finally, note that 2 and 3 in the list above can be done in the Arango GUI if you are more comfortable there. The statement for 3 could be something like

let newEdges = ( for csv1rec in csv1_collection
                  for csv2rec in csv2_collection
                  filter csv1rec.id = csv2rec.colA
                return {from : csv1rec.id , to : csv2rec.colA} )
for rec in newEdges
insert {_from: rec.from, _to: rec.to} in edgeCollection

Note that I am writing the AQL above for step 3 from memory, so it may need a little tweaking.