7
votes

Trace below.

The relevant Python snippet:

bucket = _get_bucket(location['bucket'])
blob = bucket.blob(location['path'])
blob.upload_from_filename(source_path)

Which ultimately triggers (from the ssl library):

OverflowError: string longer than 2147483647 bytes

I assume there is some special configuration option I'm missing?

This is possibly related to this ~1.5yr old apparently still-open issue: https://github.com/googledatalab/datalab/issues/784.

Help appreciated!

Full trace:

[File "/usr/src/app/gcloud/download_data.py", line 109, in ******* blob.upload_from_filename(source_path)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 992, in upload_from_filename size=total_bytes)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 946, in upload_from_file client, file_obj, content_type, size, num_retries)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 867, in _do_upload client, stream, content_type, size, num_retries)

File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 700, in _do_multipart_upload transport, data, object_metadata, content_type)

File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py", line 97, in transmit retry_strategy=self._retry_strategy)

File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request func, RequestsMixin._get_status_code, retry_strategy)

File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry response = func()

File "/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py", line 186, in request method, url, data=data, headers=request_headers, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request resp = self.send(prep, **send_kwargs)

File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 618, in send r = adapter.send(request, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send timeout=timeout

File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 601, in urlopen chunked=chunked)

File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 357, in _make_request conn.request(method, url, **httplib_request_kw)

File "/usr/lib/python3.5/http/client.py", line 1106, in request self._send_request(method, url, body, headers)

File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request self.endheaders(body)

File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders self._send_output(message_body)

File "/usr/lib/python3.5/http/client.py", line 936, in _send_output self.send(message_body)

File "/usr/lib/python3.5/http/client.py", line 908, in send self.sock.sendall(data)

File "/usr/lib/python3.5/ssl.py", line 891, in sendall v = self.send(data[count:])

File "/usr/lib/python3.5/ssl.py", line 861, in send return self._sslobj.write(data)

File "/usr/lib/python3.5/ssl.py", line 586, in write return self._sslobj.write(data)

OverflowError: string longer than 2147483647 bytes

1
This may be the problem : The longest data length of the system. This is only seen on 32 bit machines.If you are sure that you are using a 64 bit machine, one or more modules that you use are changing this value. If you are using a 32bit system, you need to do an injection, which is not an easy procedure.In some cases, installing the 64-bit related modules manually can solve the problem. It's not just about sending the file, it's also necessary to encrypt(SSL).dsgdfg
"This is only seen on 32 bit machines." Interesting idea, but I don't think this is going on--we're running on GKE with gitlab.com/nvidia/cuda/blob/ubuntu16.04/8.0/devel/Dockerfile as the base image (perhaps this is actually 32 bit and we don't realize it?). "If you are sure that you are using a 64 bit machine, one or more modules that you use are changing this value." @dsgdfg, can you expand on how to diagnose? We're working with fairly vanilla image & dependencies, so this would be rather odd, if this is what is going on. (But obviously something undesirable is happening!)severian
i check related(not your) module(socketmodule.h in SSL(SSL1.16)) , line80:define SIZEOF_SOCKET_T SIZEOF_INT mean SIZEOF_SOCKET_T = 8(64Bit) and SIZEOF_INT == sys.maxint. Your SSL module is launching a new process by reading the configuration in system resources. This error may be due to your network card's configuration. The availability of data indicates that your application supports 64bit. However, your hardware or hardware configuration does not support this process (64bit).dsgdfg
As I said in the beginning, this may be caused by missing or faulty configuration of your system/hardware. I think your system is 32 bits, at least your Internet card is.dsgdfg

1 Answers

11
votes

The issue is it is attempting to read the entire file into memory. Following the chain from upload_from_filename shows that it stats the file and then passes that in as the upload size as a single upload part.

Instead, specifying a chunk_size when creating the object will trigger it to upload in multiple parts:

# Must be a multiple of 256KB per docstring    
CHUNK_SIZE = 10485760  # 10MB
blob = bucket.blob(location['path'], chunk_size=CHUNK_SIZE)

Happy Hacking!