0
votes

I seem to be hitting resource limitations in Google Colab Pro for reading from Drive (or even transferring data from drive to /tmp/ colab local storage). I have been using it pretty heavily for the past month, training large BERT models and saving relatively large datasets (3-5GB). One example of an error I am getting only now (after successfully reading many times), the following code causes Colab to crash:

from google.colab import drive
drive.mount('/content/drive')

!pip install transformers==3.5.0
!pip install datasets==1.0.2

import transformers
import torch
import datasets

# retrieve data
split_main = datasets.load_from_disk('bigquery/combined')

The dataset I am trying to load is an arrow file about 3GBs in size, the datasets.load_from_disk() will do a memory map to the file. Now, to test this problem, I also tried to read a large .csv for a totally different dataset. It also errors out, but does not crash. I get Input/Output error. However, both files have previously been read with no problem.

Are there resource limitations you can hit in terms of read/write access to Google drive from Colab?

I am not referring to drive storage - of which I have plenty - nor am I hitting memory limitations using Colab Pro.

That would be the only explanation for the same code working at one time and then throwing input/output errors another time.

2

2 Answers

0
votes

make sure you enough space in your google drive

0
votes

Known issue, fix is pending but will take a bit to land yet; workaround in https://github.com/googlecolab/colabtools/issues/1607#issuecomment-701704057.