I am trying to transfer large files from S3 to GCP using Airflow and its Operator S3ToGoogleCloudStorageOperator. I have been able to transfer files of 400 Mb but I fail if I try larger: 2Gb I get the following error:
[2018-09-19 12:30:43,907] {models.py:1736} ERROR - [Errno 28] No space left on device Traceback (most recent call last):
File "/home/jma/airflow/env/lib/python3.5/site-packages/airflow/models.py", line 1633, in _run_raw_task result = task_copy.execute(context=context)
File "/home/jma/airflow/env/lib/python3.5/site-packages/airflow/contrib/operators/s3_to_gcs_operator.py", line 156, in execute file_object.download_fileobj(f)
File "/home/jma/airflow/env/lib/python3.5/site-packages/boto3/s3/inject.py", line 760, in object_download_fileobj ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
File "/home/jma/airflow/env/lib/python3.5/site-packages/boto3/s3/inject.py", line 678, in download_fileobj return future.result()
File "/home/jma/airflow/env/lib/python3.5/site-packages/s3transfer/futures.py", line 73, in result return self._coordinator.result()
File "/home/jma/airflow/env/lib/python3.5/site-packages/s3transfer/futures.py", line 233, in result raise self._exception
File "/home/jma/airflow/env/lib/python3.5/site-packages/s3transfer/tasks.py" , line 126, in call return self._execute_main(kwargs)
File "/home/jma/airflow/env/lib/python3.5/site-packages/s3transfer/tasks.py", line 150, in _execute_main return_value = self._main(**kwargs)
File "/home/jma/airflow/env/lib/python3.5/site-packages/s3transfer/download.py", line 583, in _main fileobj.write(data)
File "/home/jma/airflow/env/lib/python3.5/tempfile.py", line 622, in func_wrapper return func(*args, **kwargs) OSError: [Errno 28] No space left on device
The full code of the DAG can be found in this other SO question.
The file does not go direct from S3 to GCP but is downloaded to the machine where Airflow is running. Looking at the traces it seems boto could be responsible but still can't figure out how to fix the issue, that is assign a folder for the file to be copied temporarily.
I would like to move files very large so, how to setup so that there is no limitation imposed?
I am running Airflow 1.10 from Google Cloud Shell in GCP, where I have 4 Gb of free space in the home directory (the file being moved is 2Gb)