2
votes

First of all. I'm sorry if this is not complicity structured. I'm just not sure where to start or end, but did my best to give you as many information as possible.

I work on a AWS M3.large, py2neo 2.0.4 and neo4j-community-2.1.7

I am trying to import a large dataset into neo4j using py2neo. My problem is, when I have read in around 150k, it just give me a: py2neo.packages.httpstream.http.SocketError: timed out

I need to go up in the millions of inputs, so 150k should just work.

Whole error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 322, in submit
    response = send()
  File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 318, in send
    return http.getresponse(**getresponse_args)
  File "/usr/lib/python3.4/http/client.py", line 1147, in getresponse
    response.begin()
  File "/usr/lib/python3.4/http/client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.4/http/client.py", line 313, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.4/socket.py", line 371, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 331, in submit
    response = send("timeout")
  File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 318, in send
    return http.getresponse(**getresponse_args)
  File "/usr/lib/python3.4/http/client.py", line 1147, in getresponse
    response.begin()
  File "/usr/lib/python3.4/http/client.py", line 351, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.4/http/client.py", line 313, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.4/socket.py", line 371, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "transactions.py", line 221, in <module>
    read_zip("data")
  File "transactions.py", line 44, in read_zip
    create_tweets(lines)
  File "transactions.py", line 215, in create_tweets
    tx.process()
  File "/usr/local/lib/python3.4/dist-packages/py2neo/cypher/core.py", line 296, in process
    return self.post(self.__execute or self.__begin)
  File "/usr/local/lib/python3.4/dist-packages/py2neo/cypher/core.py", line 248, in post
    rs = resource.post({"statements": self.statements})
  File "/usr/local/lib/python3.4/dist-packages/py2neo/core.py", line 322, in post
    response = self.__base.post(body, headers, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 984, in post
    return rq.submit(**kwargs)
  File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 433, in submit
    http, rs = submit(self.method, uri, self.body, self.headers)
  File "/usr/local/lib/python3.4/dist-packages/py2neo/packages/httpstream/http.py", line 362, in submit
    raise SocketError(code, description, host_port=uri.host_port)
py2neo.packages.httpstream.http.SocketError: timed out

Right now I use cypher. I write in batches of ~1000, but smaller batches don't work either. My question, can I use something else to make it faster?

Right now, I do:

stagement = "match (p:person {id=123}) ON CREATE SET p.age = 132" 

def add_names(names):
    for-loop with batches of 1000:
        tx = graph.cypher.begin()

        for name in names:
            tx.append(statement, {"N": name})
        tx.process()

    tx.commit()

But would it be better to use execute or stream, or anything else I can do to make it work?

Useful link:

1

1 Answers

6
votes

Try adding

from py2neo.packages.httpstream import http http.socket_timeout = 9999