First of all. I'm sorry if this is not complicity structured. I'm just not sure where to start or end, but did my best to give you as many information as possible.
I work on a AWS M3.large, py2neo 2.0.4 and neo4j-community-2.1.7
I am trying to import a large dataset into neo4j using py2neo. My problem is, when I have read in around 150k, it just give me a: py2neo.packages.httpstream.http.SocketError: timed out
I need to go up in the millions of inputs, so 150k should just work.
Whole error:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 322, in submit
response = send()
File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 318, in send
return http.getresponse(**getresponse_args)
File "/usr/lib/python3.4/http/client.py", line 1147, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 313, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.4/socket.py", line 371, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 331, in submit
response = send("timeout")
File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 318, in send
return http.getresponse(**getresponse_args)
File "/usr/lib/python3.4/http/client.py", line 1147, in getresponse
response.begin()
File "/usr/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.4/http/client.py", line 313, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.4/socket.py", line 371, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "transactions.py", line 221, in <module>
read_zip("data")
File "transactions.py", line 44, in read_zip
create_tweets(lines)
File "transactions.py", line 215, in create_tweets
tx.process()
File "/usr/local/lib/python3.4/dist-packages/py2neo/cypher/core.py", line 296, in process
return self.post(self.__execute or self.__begin)
File "/usr/local/lib/python3.4/dist-packages/py2neo/cypher/core.py", line 248, in post
rs = resource.post({"statements": self.statements})
File "/usr/local/lib/python3.4/dist-packages/py2neo/core.py", line 322, in post
response = self.__base.post(body, headers, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 984, in post
return rq.submit(**kwargs)
File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 433, in submit
http, rs = submit(self.method, uri, self.body, self.headers)
File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 362, in submit
raise SocketError(code, description, host_port=uri.host_port)
py2neo.packages.httpstream.http.SocketError: timed out
Right now I use cypher. I write in batches of ~1000, but smaller batches don't work either. My question, can I use something else to make it faster?
Right now, I do:
stagement = "match (p:person {id=123}) ON CREATE SET p.age = 132"
def add_names(names):
for-loop with batches of 1000:
tx = graph.cypher.begin()
for name in names:
tx.append(statement, {"N": name})
tx.process()
tx.commit()
But would it be better to use execute or stream, or anything else I can do to make it work?
Useful link: