2
votes

I am trying to retrieve all files in Google Drive, but only those in 'My Drive'. I tried including "'me' in owners" in the query, but that gives me tons of files in shared folders where I am the owner. I tried "'root' in parents" in the query, but that gives me back only files directly under My Drive, while I need also files under subfolders and subolders of those subolders, etc.

I tried also setting the drive parameter but in this case the query does not retrieve anything at all:


driveid = service.files().get(fileId='root').execute()['id']

page_token = None
my_files = list()
while True:
    results = service.files().list(q= "'[email protected]' in owners",
                                    pageSize=10,
                                    orderBy='modifiedTime',
                                    pageToken=page_token,
                                    spaces = 'drive',
                                    corpora='drive',
                                    driveId = driveid, 
                                    includeItemsFromAllDrives=True,
                                    supportsAllDrives=True,
                                    fields="nextPageToken, files(id, name)").execute()
    items = results.get('files', [])
    my_files.extend(items)
    page_token = results.get('nextPageToken', None)
    if page_token is None:
        break

print(len(my_files))
# This prints: 0

How can I get this to work?

I guess the other possibility would be to start from root, get children and recursively navigate the full tree, but that is going to be very slow. The same applies if I get all the files and then find out all the parents to check if they are in My Drive or not, I have too many files and that takes hours.

Thanks in advance!

2
set pagesize to 1000 would definatly speed things up. But whats the issue with what you have now? I dont think i under stand only those in 'My Drive' (not only those directly under 'My Drive', but in any part of the full tree) - DaImTo
I want to get all files in 'My Drive' and all subfolders. I wrote that sentence because I have seen some answers saying use " 'root' in parents", but that gives to me only items directly under 'My Drive' and not files under the subfolders (and subfolders of those subfolders) of My Drive. And thanks for the pagesize advice. - Otto Fajardo
Technically speaking service.files().list() will return all the files in your drive if you keep looping. Its just not going to be in any special order your going to have to order them locally or use check the file type if its a directory then use parents to find the files in each directory and step though it like that. I have an example of doing that but its in C# if your interested. - DaImTo
you may also want to consider the fact that file.list is going to return directories as well which are not really files so you might want to check the mimetype as well. - DaImTo

2 Answers

1
votes

The first request you make would be to parents in root. This is the top level of your drive account.

results = service.files().list(q= "root in parents").execute()

Now you will need to loop though the results here in your code. Check for mime type being a directory 'application/vnd.google-apps.folder' Everything that is not a directory should be a file sitting in the root directory of your Google drive account.

Now all those directories that you found what you can do is make a new request to find out the files in those directories

results = service.files().list(q= "directorIDFromLastRequest in parents").execute()

You can then loop though getting all of the files in each of the directories. Looks like its a known bug Drive.Files.list query throws error when using "sharedWithMe = false"

shared with me

You can also set SharedWithMe = false in the q parameter and this should remove all of the files that have been shared with you. Causing it to only return the files that are actually yours.

This used to work but i am currently having issues with it while i am testing.

Speed.

The thing is as mentioned files.list will by default just return everything but in no order so technically you could just do a file.list and add the sharedwithme and get back all the files and directories on your drive account. By requesting pagesize of 1000 you will then have fewer requests. Then sort it all locally on your machine once its down.

The other option would be to do as i have written above and grab each directory in turn. This will probably result in more requests.

0
votes

Possible fix here using google drive API v3 with python 3.7+

use the following syntax:

q="mimeType='application/vnd.google-apps.folder' and trashed = false and 'me' in owners"

This query passed into service.files().list method should get you what you need. A list of all folders owned by you which is the best workaround I could find. " 'me' in owners" is the key here.

Full snippet here:

response = service.files().list(q="mimeType='application/vnd.google-apps.folder' and trashed = false and 'me' in owners",
                                spaces='drive',
                                fields='nextPageToken, files(id, name)',
                                pageToken=page_token).execute()

for file in response.get('files', []):
    # Process change
    print ('Found file: %s (%s)' % (file.get('name'), file.get('id')))