I have a very huge ecommerce order data (including product details). I just started to explore Neo4j to load into graph database to calculate product relationships and patterns via graph algorithms. Following are the fields in my csv file
CUSTOMER_UNIQUE_ID (Customer Code)
ORDER_ID (Order Code)
ORDER_DATE (Order date)
CLIENT_TYPE (Ordered via Mobile / App / Desktop)
PARENT_SKU (Product ID)
LEV1 (Category Level 1)
LEV2 (Category Level 2)
LEV3 (Category Level 3)
To load the data I am using the following cypher code:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "FILE:///E:/Data/2015/Nov/MBA/order_item_MBA.csv" AS line
MERGE(product:Product {parent_sku:line.PARENT_SKU}) ON CREATE SET product.parent_sku = line.PARENT_SKU, product.lev1 = line.LEV1, product.lev2 = line.LEV2, product.lev3 = line.LEV3
It's taking 13 minutes to just run the above script of 50K records (5MB file size). Am i going wrong somewhere ? I was planning to load around 30M records. Apprx. 20+M nodes & 100+M edges. I want to create a product-customer graph creating edges based on products bought.