I am working on using Neo4j with py2neo for analyzing Twitter data. I'm a newbie in all of these, so the question might be pretty basic. But I could not find the answer in any of the documentations. I have two csv files, one with 100 followers, the other with about 22000 tweets. For the tweet I have informations like it is a reply to another tweet and the other users who have been mentioned in this tweet.
I want to add followers and tweets as nodes, then using the reply_to and the mentions_user field of the tweets to add connections between tweets (reply_to) and tweet and user (mentions).
Adding the nodes works well with batch. However, when I want to iterate through all Tweets using py2neo to add the relationships I get OutOfMemoryError: Java heap space.
I'm trying to iterate through the tweets like this:
for tweet in graph.find("Tweet")
My questions are now: a) Is there another way in py2neo to iterate through (a lot of) nodes? b) A little broader: I read in the py2neo documentation it is better to use cypher transactions than batch. Should I do that and could that also help for a)?
Thanks in advance for any help! KMM