Updated question:
I'm new in neo4j and im trying to create a movie rating database. I have a file with 100.000 registers (2,3 MB) that represents when a user rated a movie; the file look like this (this dataset is from MovieLens):
I'm using py2neo and I create de datebase with this code:
data = pd.read_csv('ratings_small.csv')
def create_bipartite_graph(data):
#Indexes creation to perform the queries
graph.run('''
CREATE INDEX user_index IF NOT EXISTS FOR (u:User) ON (u.UserId)
''')
graph.run('''
CREATE INDEX movie_index IF NOT EXISTS FOR (m:Movie) ON (m.MovieId)
''')
actual_user_node = None
for index, row in data.iterrows():
userID = int(row['userId'])
movieID = int(row['movieId'])
rating = row['rating']
date = datetime.fromtimestamp(row['timestamp']).strftime("%m/%d/%Y, %H:%M:%S")
#Creation nodes and relationships
graph.run('''
MERGE(u:User{UserId: $uID})
MERGE(m:Movie{MovieId: $mID})
CREATE (u)-[:RATED_MOVIE{rating: $r, date: $d}]->(m)
''', parameters = {'uID': userID, 'mID': movieID, 'r': rating, 'd': date})
The problem is with that small dataset, takes more than 2 hours in create the graph. Any advice for the time of the databse creation decrease considerably?
Using LOAD CSV
:
This is the query that I execute in neo4j browser
EXPLAIN LOAD CSV With HEADERS FROM 'file:///ratings_small.csv' AS line FIELDTERMINATOR ','
MERGE(m:Movie{MovieId: toInteger(line.movieId)})
MERGE(u:User{UserId: toInteger(line.userId)})
CREATE (u)-[:RATED_MOVIE{rating:toFloat(line.rating)}]->(m)
And this is the profile plan: profile plan