I seem to be hitting resource limitations in Google Colab Pro for reading from Drive (or even transferring data from drive to /tmp/ colab local storage). I have been using it pretty heavily for the past month, training large BERT models and saving relatively large datasets (3-5GB). One example of an error I am getting only now (after successfully reading many times), the following code causes Colab to crash:
from google.colab import drive
drive.mount('/content/drive')
!pip install transformers==3.5.0
!pip install datasets==1.0.2
import transformers
import torch
import datasets
# retrieve data
split_main = datasets.load_from_disk('bigquery/combined')
The dataset I am trying to load is an arrow file about 3GBs in size, the datasets.load_from_disk() will do a memory map to the file. Now, to test this problem, I also tried to read a large .csv for a totally different dataset. It also errors out, but does not crash. I get Input/Output error. However, both files have previously been read with no problem.
Are there resource limitations you can hit in terms of read/write access to Google drive from Colab?
I am not referring to drive storage - of which I have plenty - nor am I hitting memory limitations using Colab Pro.
That would be the only explanation for the same code working at one time and then throwing input/output errors another time.