146
votes

The problem is simple: I have some data on gDrive, for example at /projects/my_project/my_data*.

Also I have a simple notebook in gColab.

So, I would like to do something like:

for file in glob.glob("/projects/my_project/my_data*"):
    do_something(file)

Unfortunately, all examples (like this - https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb, for example) suggests to only mainly load all necessary data to notebook.

But, if I have a lot of pieces of data, it can be quite complicated. Is there any opportunities to solve this issue?

Thanks for help!

15
Surprising! no one gave a link to this colab notebook that describes all the methods available as of Apr 2019 - colab.research.google.com/notebooks/io.ipynbhuman

15 Answers

69
votes

Good news, PyDrive has first class support on CoLab! PyDrive is a wrapper for the Google Drive python client. Here is an example on how you would download ALL files from a folder, similar to using glob + *:

!pip install -U -q PyDrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# choose a local (colab) directory to store the data.
local_download_path = os.path.expanduser('~/data')
try:
  os.makedirs(local_download_path)
except: pass

# 2. Auto-iterate using the query syntax
#    https://developers.google.com/drive/v2/web/search-parameters
file_list = drive.ListFile(
    {'q': "'1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk' in parents"}).GetList()

for f in file_list:
  # 3. Create & download by id.
  print('title: %s, id: %s' % (f['title'], f['id']))
  fname = os.path.join(local_download_path, f['title'])
  print('downloading to {}'.format(fname))
  f_ = drive.CreateFile({'id': f['id']})
  f_.GetContentFile(fname)


with open(fname, 'r') as f:
  print(f.read())

Notice that the arguments to drive.ListFile is a dictionary that coincides with the parameters used by Google Drive HTTP API (you can customize the q parameter to be tuned to your use-case).

Know that in all cases, files/folders are encoded by id's (peep the 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk) on Google Drive. This requires that you search Google Drive for the specific id corresponding to the folder you want to root your search in.

For example, navigate to the folder "/projects/my_project/my_data" that is located in your Google Drive.

Google Drive

See that it contains some files, in which we want to download to CoLab. To get the id of the folder in order to use it by PyDrive, look at the url and extract the id parameter. In this case, the url corresponding to the folder was:

https://drive.google.com/drive/folders/1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk

Where the id is the last piece of the url: 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk.

335
votes

Edit: As of February, 2020, there's now a first-class UI for automatically mounting Drive.

First, open the file browser on the left hand side. It will show a 'Mount Drive' button. Once clicked, you'll see a permissions prompt to mount Drive, and afterwards your Drive files will be present with no setup when you return to the notebook. The completed flow looks like so:

Drive auto mount example

The original answer follows, below. (This will also still work for shared notebooks.)

You can mount your Google Drive files by running the following code snippet:

from google.colab import drive
drive.mount('/content/drive')

Then, you can interact with your Drive files in the file browser side panel or using command-line utilities.

Here's an example notebook

40
votes

Thanks for the great answers! Fastest way to get a few one-off files to Colab from Google drive: Load the Drive helper and mount

from google.colab import drive

This will prompt for authorization.

drive.mount('/content/drive')

Open the link in a new tab-> you will get a code - copy that back into the prompt you now have access to google drive check:

!ls "/content/drive/My Drive"

then copy file(s) as needed:

!cp "/content/drive/My Drive/xy.py" "xy.py"

confirm that files were copied:

!ls
20
votes

What I have done is first:

from google.colab import drive
drive.mount('/content/drive/')

Then

%cd /content/drive/My Drive/Colab Notebooks/

After I can for example read csv files with

df = pd.read_csv("data_example.csv")

If you have different locations for the files just add the correct path after My Drive

18
votes

Most of the previous answers are a bit(Very) complicated,

from google.colab import drive
drive.mount("/content/drive", force_remount=True)

I figured out this to be the easiest and fastest way to mount google drive into CO Lab, You can change the mount directory location to what ever you want by just changing the parameter for drive.mount. It will give you a link to accept the permissions with your account and then you have to copy paste the key generated and then drive will be mounted in the selected path.

force_remount is used only when you have to mount the drive irrespective of whether its loaded previously.You can neglect this when parameter if you don't want to force mount

Edit: Check this out to find more ways of doing the IO operations in colab https://colab.research.google.com/notebooks/io.ipynb

14
votes

You can't permanently store a file on colab. Though you can import files from your drive and everytime when you are done with file you can save it back.

To mount the google drive to your Colab session

from google.colab import drive
drive.mount('/content/gdrive')

you can simply write to google drive as you would to a local file system Now if you see your google drive will be loaded in the Files tab. Now you can access any file from your colab, you can write as well as read from it. The changes will be done real time on your drive and anyone having the access link to your file can view the changes made by you from your colab.

Example

with open('/content/gdrive/My Drive/filename.txt', 'w') as f:
   f.write('values')
5
votes

I’m lazy and my memory is bad, so I decided to create easycolab which is easier to memorize and type:

import easycolab as ec
ec.mount()

Make sure to install it first: !pip install easycolab

The mount() method basically implement this:

from google.colab import drive
drive.mount(‘/content/drive’)
cd ‘/content/gdrive/My Drive/’
2
votes

You can simply make use of the code snippets on the left of the screen. enter image description here

Insert "Mounting Google Drive in your VM"

run the code and copy&paste the code in the URL

and then use !ls to check the directories

!ls /gdrive

for most cases, you will find what you want in the directory "/gdrive/My drive"

then you may carry it out like this:

from google.colab import drive
drive.mount('/gdrive')
import glob

file_path = glob.glob("/gdrive/My Drive/***.txt")
for file in file_path:
    do_something(file)
2
votes

To read all files in a folder:

import glob
from google.colab import drive
drive.mount('/gdrive', force_remount=True)

#!ls "/gdrive/My Drive/folder"

files = glob.glob(f"/gdrive/My Drive/folder/*.txt")
for file in files:  
  do_something(file)
2
votes
from google.colab import drive
drive.mount('/content/drive')

This worked perfect for me I was later able to use the os library to access my files just like how I access them on my PC

1
votes

I wrote a class that downloads all of the data to the '.' location in the colab server

The whole thing can be pulled from here https://github.com/brianmanderson/Copy-Shared-Google-to-Colab

!pip install PyDrive


from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import os

class download_data_from_folder(object):
    def __init__(self,path):
        path_id = path[path.find('id=')+3:]
        self.file_list = self.get_files_in_location(path_id)
        self.unwrap_data(self.file_list)
    def get_files_in_location(self,folder_id):
        file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format(folder_id)}).GetList()
        return file_list
    def unwrap_data(self,file_list,directory='.'):
        for i, file in enumerate(file_list):
            print(str((i + 1) / len(file_list) * 100) + '% done copying')
            if file['mimeType'].find('folder') != -1:
                if not os.path.exists(os.path.join(directory, file['title'])):
                    os.makedirs(os.path.join(directory, file['title']))
                print('Copying folder ' + os.path.join(directory, file['title']))
                self.unwrap_data(self.get_files_in_location(file['id']), os.path.join(directory, file['title']))
            else:
                if not os.path.exists(os.path.join(directory, file['title'])):
                    downloaded = drive.CreateFile({'id': file['id']})
                    downloaded.GetContentFile(os.path.join(directory, file['title']))
        return None
data_path = 'shared_path_location'
download_data_from_folder(data_path)
1
votes

To extract Google Drive zip from a Google colab notebook for example:

import zipfile
from google.colab import drive

drive.mount('/content/drive/')

zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()
0
votes

@wenkesj

I am speaking about copy the directory and all it subdirectories.

For me, I found a solution, that looks like this:

def copy_directory(source_id, local_target):
  try:
    os.makedirs(local_target)
  except: 
    pass
  file_list = drive.ListFile(
    {'q': "'{source_id}' in parents".format(source_id=source_id)}).GetList()
  for f in file_list:
    key in ['title', 'id', 'mimeType']]))
    if f["title"].startswith("."):
      continue
    fname = os.path.join(local_target, f['title'])
    if f['mimeType'] == 'application/vnd.google-apps.folder':
      copy_directory(f['id'], fname)
    else:
      f_ = drive.CreateFile({'id': f['id']})
      f_.GetContentFile(fname)

Nevertheless, I looks like gDrive don't like to copy too much files.

0
votes

There are many ways to read the files in your colab notebook(**.ipnb), a few are:

  1. Mounting your Google Drive in the runtime's virtual machine.here &, here
  2. Using google.colab.files.upload(). the easiest solution
  3. Using the native REST API;
  4. Using a wrapper around the API such as PyDrive

Method 1 and 2 worked for me, rest I wasn't able to figure out. If anyone could, as others tried in above post please write an elegant answer. thanks in advance.!

First method:

I wasn't able to mount my google drive, so I installed these libraries

# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass

!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

Once the installation & authorization process is finished, you first mount your drive.

!mkdir -p drive
!google-drive-ocamlfuse drive

After installation I was able to mount the google drive, everything in your google drive starts from /content/drive

!ls /content/drive/ML/../../../../path_to_your_folder/

Now you can simply read the file from path_to_your_folder folder into pandas using the above path.

import pandas as pd
df = pd.read_json('drive/ML/../../../../path_to_your_folder/file.json')
df.head(5)

you are suppose you use absolute path you received & not using /../..

Second method:

Which is convenient, if your file which you want to read it is present in the current working directory.

If you need to upload any files from your local file system, you could use below code, else just avoid it.!

from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

suppose you have below the folder hierarchy in your google drive:

/content/drive/ML/../../../../path_to_your_folder/

Then, you simply need below code to load into pandas.

import pandas as pd
import io
df = pd.read_json(io.StringIO(uploaded['file.json'].decode('utf-8')))
df
0
votes

Consider just downloading the file with permanent link and gdown preinstalled like here