0
votes

I want to import two csv files to a Orientdb database. The first is the apex, with 1 million records. The second are the edges with 59 million records

I have two json file to import:

vértex

{
  "source": { "file": { "path": "../csvs/metodo01/pesquisador.csv" } },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": {} },
    { "vertex": { "class": "Pesquisador" } }
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/dbCemMilM01", 
       "dbType": "graph",
       "batchCommit": 1000,
       "classes": [
         {"name": "Pesquisador", "extends": "V"}
       ], "indexes": [
         {"class":"Pesquisador", "fields":["psq_id:integer"], "type":"UNIQUE" }
       ]
    }
  }
}

edge

{
    "config": {
        "log": "info",
            "parallel": false
    },
    "source": {
        "file": {
            "path": "../csvs/metodo01/a10.csv"
        }
    },
    "extractor": {
        "row": {
        }
    },
    "transformers": [{
        "csv": {
            "separator": ",",
            "columnsOnFirstLine": true,
            "columns": ["psq_id_from:integer",
            "pub_id_to:integer",
            "ordem:integer"]
        }
    },
    {
        "command": {
            "command": "create edge PUBLICOU from (select from Pesquisador where psq_id = ${input.psq_id_from}) to   (select from Publicacao  where pub_id = ${input.pub_id_to}) set  ordem = ${input.ordem} ",
            "output": "edge"
        }
    }],
    "loader": {
        "orientdb": {
            "dbURL": "remote:localhost/dbUmMilhaoM01", 
            "dbType": "graph",
            "standardElementConstraints": false,
            "batchCommit": 1000,
            "classes": [{
                "name": "PUBLICOU",
                "extends": "E"
            }]
        }
    }
}

In the process the Orientdb suggests using index to accelerate the process.

How do I do that?

Just the command is create edge PUBLICOU from (select from Pesquisador where psq_id = ${input.psq_id_from}) to (select from Publicacao where pub_id = ${input.pub_id_to}) set ordem = ${input.ordem}

3
Have you seen official docs regarding indexing: orientdb.com/docs/last/Indexes.html ?Oleksandr Gubchenko

3 Answers

0
votes

To speed up the create edge process you may need indexes on both properties Pesquisador.psq_id , that you already have, and on Publicacao.pub_id.

Ivan

0
votes

You can declare indexes directly in the ETL configuration. Example taken from DBPedia importer:

"orientdb": {
  "dbURL": "plocal:/temp/databases/dbpedia",
  "dbUser": "importer",
  "dbPassword": "IMP",
  "dbAutoCreate": true,
  "tx": false,
  "batchCommit": 1000,
  "wal" : false,
  "dbType": "graph",
  "classes": [
    {"name":"Person", "extends": "V" },
    {"name":"Customer", "extends": "Person", "clusters":8 }
  ],
  "indexes": [
    {"class":"V", "fields":["URI:string"], "type":"UNIQUE" },
    {"class":"Person", "fields":["town:string"], "type":"NOTUNIQUE" ,
        metadata : { "ignoreNullValues" : false }
    }
  ]
}

For more information look at: http://orientdb.com/docs/2.2/Loader.html

0
votes

To speedup the load process my suggestion is to work in plocal mode and then mode the created db to a standalone OrientDB server.