0
votes

I try to train a neural network on Colab using a GPU there. I am now wondering if I am on the right pave and if all the steps I am doing are necessary, because the process I am following does not appear very efficient to me.

# Install the PyDrive wrapper & import libraries.
# This only needs to be done once per notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

import os

# choose a local (colab) directory to store the data.
local_root_path = os.path.expanduser("~/data")
try:
  os.makedirs(local_root_path)
except: pass

def ListFolder(google_drive_id, destination):
  file_list = drive.ListFile({'q': "'%s' in parents and trashed=false" % google_drive_id}).GetList()
  counter = 0
  for f in file_list:
    # If it is a directory then, create the dicrectory and upload the file inside it
    if f['mimeType']=='application/vnd.google-apps.folder': 
      folder_path = os.path.join(destination, f['title'])
      os.makedirs(folder_path)
      print('creating directory {}'.format(folder_path))
      ListFolder(f['id'], folder_path)
    else:
      fname = os.path.join(destination, f['title'])
      f_ = drive.CreateFile({'id': f['id']})
      f_.GetContentFile(fname)
      counter += 1
  print('{} files were uploaded in {}'.format(counter, destination))

ListFolder("1s1Ks_Gf_cW-F-RwXFjBu96svbmqiXB0o", local_root_path)

This commands allow to connect the Notebook in Colab with my Google Drive and stores the data in Colab. Because I have a lot of images (more than 180k) the storage of the data in Colab takes very, very long and partially the connection breaks. I am now wondering if it is necessray to upload all the data from my Google Drive to Colab?

If no, what do I have to do instead to work with the data from Google Drive? If yes, is there a way to do this more efficiently? Or is there maybe a completely different way I should work with Colab?

1

1 Answers

2
votes

You can access files directly on your Google drive without copying them into Notebook environment.
Execute this code in one cell:

from google.colab import drive 
drive.mount('/content/gdrive') 

And try:

!ls /content/gdrive

Now you can copy your files from/to /content/gdrive directory and they will appear in your Google Drive.