2
votes

I am trying to read the csv file which have nodes ids and their respective relation between them. First two columns represent nodes and third colum represents the relation between them. So far I am able to create the database in neo4j but I am not sure what would be the cypher query to fetch the desired data into pandas DataFrame!

I will use the subset of large dataset here to illustrate my problem. Original dataset contains thousands of nodes and relations.

My csv file(Node1_id, Node2_id, relation_id) looks like this:

0   1   1
4   2   1
44  3   1
0   4   1
0   5   1
4   10173   3
4   10191   2
4   10192   2
6   10193   2
8   10194   2
3   10195   2
6   10196   2

Here is the nodes creation and defining the relation between the nodes by loading the ids from a csv file. (I suppose this graph is correct but let me know if you notice any problem ) I am assigning one property "id" for nodes and relation using their ids from the csv file.

LOAD CSV WITH HEADERS FROM  'file:///edges.csv' AS row FIELDTERMINATOR ","
WITH row
WHERE row.relation_id = '1'
MERGE (paper:Paper{id:(row.Node1_id)})
MERGE (author:Author{id:(row.Node2_id)})
CREATE (paper)-[au:AUTHORED{id: '1'}]->(author);

So far i have tried something like this:

    query = ''' MATCH (paper)-[au:AUTHORED{id: '1'}]->(author) RETURN paper,author LIMIT 3; ''' 
    result = session.run(query)
    df = DataFrame(result)

    for dataF in df.itertuples(index=False):
    print(row)

It returns this:

0   1
0   (id)    (id)
1   (id)    (id)
2   (id)    (id)

Desired results:

I want results into pandas DataFrame in the format with nodes ids and relation ids such as defined in csv above by querying the data from graphDB and iterate the results row by row.

0   1   1
4   2   1
44  3   1
0   4   1
0   5   1
4   10173   3
4   10191   2
4   10192   2
6   10193   2
8   10194   2
3   10195   2
6   10196   2

I am also interested into know what is the return type of an cypher query object in this case it is pandas.core.frame.DataFrame but how can I access the induvial properties of nodes and relation during the cypher query. This is the main problem.

Please feel free to explain in detail, I would really appreciate the help.

Using neo4j Version: 4.2.1

1

1 Answers

2
votes

I am using py2neo so if you are using differently, you can either use it or tell me which neo4j lib you are using and I will edit my answer.

#1: Desired Result

I want results into pandas DataFrame in the format with nodes ids and relation ids such as defined in csv above by querying the data from graphDB and iterate the results row by row.

 from py2neo import Graph 
 from pandas import DataFrame
 # remove search by au.id='1' and limit so that you will get all 
 # return the id in your query 
 session = Graph("bolt://localhost:7687", auth=("neo4j", "****"))
 query = ''' MATCH (paper)-[au:AUTHORED{id: '1'}]->(author) RETURN paper.id, author.id, au.id LIMIT 3; ''' 
 # access the result data
 result = session.run(query).data() 
 # convert result into pandas dataframe 
 df = DataFrame(result)
 df.head()

Result:

0   1   1
4   2   1
44  3   1

#2: Another question

how can I access the induvial properties of nodes and relation during the cypher query ANS: the properties inside nodes are dict so use the get function

 # Note that we are returning the nodes and not ids
 query = ''' MATCH (paper)-[au:AUTHORED{id: '1'}]->(author) RETURN paper, author, au LIMIT 3; ''' 
result = session.run(query).data() 
print ("What is data type of result? ", type(result))
print ("What is the data type of each item? ", type(result[0]))
print ("What are the keys of the dictionary? ", result[0].keys())
print ("What is the class of the node? ", type(result[0].get('paper')))
print ("How to access the first node? ", result[0].get('paper'))
print ("How to access values inside the node? ", result[0].get('paper',{}).get('id'))

Result:
What is data type of result?  <class 'list'>
What is the data type of each item?  <class 'dict'>
What are the keys of the dictionary?  dict_keys(['paper', 'author', 'au'])
What is the class of the node?  <class 'py2neo.data.Node'>
How to access the first node?  (_888:paper {id: '1'})
How to access values inside the node?  '1'