I am new to arango. I'm trying to import some of my data from Neo4j into arango. I am trying to add millions of nodes and edges to store playlist data for various people. I have the csv files from neo4j. I ran a script to change the format of the csv files of node to have a _key attribute. And the edges to have a _to and _from attribute. When I tried this on a very small dataset, things worked perfectly and I could see the graph on the UI and perform queries. Bingo!
Now, I am trying to add millions of rows of data ( each arangoimp batch imports a csv with about 100,000 rows ). Each batch has 5 collections ( a different csv file for each) After about 7-8 batches of such data, the system all of a sudden gets very slow, unresponsive and throws the following errors:
ERROR error message: failed with error: corrupted collection This just randomly comes up for any batch, though the format of the data is exactly the same as the previous batches
ERROR Could not connect to endpoint 'tcp://127.0.0.1:8529', database: '_system', username: 'root' FATAL got error from server: HTTP 401 (Unauthorized)'
Otherwise it just keeps processing for hours with barely any progress
I'm guessing all of this has to do with the large number of imports. Some post said that maybe I have too many file descriptors, but I'm not sure how to handle it.
Another thing I notice, is that the biggest collection of all the 5 collections, is the one that mostly gets the errors ( although the other ones also do). Do the file descriptors remain specific to a certain collection, even on different import statements?
Could someone please help point me in the right direction? I'm not sure on how to begin debugging the problem
Thank you in advance