python - Do I understand os.walk right?

65

votes

The loop for root, dir, file in os.walk(startdir) works through these steps?

for root in os.walk(startdir) 
    for dir in root 
        for files in dir

get root of start dir : C:\dir1\dir2\startdir
get folders in C:\dir1\dir2\startdir and return list of folders "dirlist"
get files in the first dirlist item and return the list of files "filelist" as the first item of a list of filelists.
move to the second item in dirlist and return the list of files in this folder "filelist2" as the second item of a list of filelists. etc.
move to the next root in the folder tree and start from 2. etc.

Right? Or does it just get all roots first, then all dirs second, and all files third?

pythonfile

85

votes

os.walk returns a generator, that creates a tuple of values (current_path, directories in current_path, files in current_path).

Every time the generator is called it will follow each directory recursively until no further sub-directories are available from the initial directory that walk was called upon.

As such,

os.walk('C:\dir1\dir2\startdir').next()[0] # returns 'C:\dir1\dir2\startdir'
os.walk('C:\dir1\dir2\startdir').next()[1] # returns all the dirs in 'C:\dir1\dir2\startdir'
os.walk('C:\dir1\dir2\startdir').next()[2] # returns all the files in 'C:\dir1\dir2\startdir'

So

import os.path
....
for path, directories, files in os.walk('C:\dir1\dir2\startdir'):
     if file in files:
          print('found %s' % os.path.join(path, file))

or this

def search_file(directory = None, file = None):
    assert os.path.isdir(directory)
    for cur_path, directories, files in os.walk(directory):
        if file in files:
            return os.path.join(directory, cur_path, file)
    return None

or if you want to look for file you can do this:

import os
def search_file(directory = None, file = None):
    assert os.path.isdir(directory)
    current_path, directories, files = os.walk(directory).next()
    if file in files:
        return os.path.join(directory, file)
    elif directories == '':
        return None
    else:
        for new_directory in directories:
            result = search_file(directory = os.path.join(directory, new_directory), file = file)
            if result:
                return result
        return None

42

votes

Minimal runnable example

This is how I like to learn stuff:

mkdir root
cd root
mkdir \
  d0 \
  d1 \
  d0/d0_d1
touch \
  f0 \
  d0/d0_f0 \
  d0/d0_f1 \
  d0/d0_d1/d0_d1_f0
tree

Output:

.
├── d0
│   ├── d0_d1
│   │   └── d0_d1_f0
│   ├── d0_f0
│   └── d0_f1
├── d1
└── f0

main.py

#!/usr/bin/env python3
import os
for path, dirnames, filenames in os.walk('root'):
    print('{} {} {}'.format(repr(path), repr(dirnames), repr(filenames)))

Output:

'root' ['d0', 'd1'] ['f0']
'root/d0' ['d0_d1'] ['d0_f0', 'd0_f1']
'root/d0/d0_d1' [] ['d0_d1_f0']
'root/d1' [] []

This makes everything clear:

path is the root directory of each step
dirnames is a list of directory basenames in each path
filenames is a list of file basenames in each path

Tested on Ubuntu 16.04, Python 3.5.2.

Modifying dirnames changes the tree recursion

This is basically the only other thing you have to keep in mind.

E.g., if you do the following operations on dirnames, it affects the traversal:

sort
filter

Walk file or directory

If the input to traverse is either a file or directory, you can handle it like this:

#!/usr/bin/env python3

import os
import sys

def walk_file_or_dir(root):
    if os.path.isfile(root):
        dirname, basename = os.path.split(root)
        yield dirname, [], [basename]
    else:
        for path, dirnames, filenames in os.walk(root):
            yield path, dirnames, filenames

for path, dirnames, filenames in walk_file_or_dir(sys.argv[1]):
    print(path, dirnames, filenames)

11

votes

In simple words os.walk() will generate tuple of path,folders,files present in given path and will keep on traversing the subfolders.

import os.path
path=input(" enter the path\n")
for path,subdir,files in os.walk(path):
   for name in subdir:
       print os.path.join(path,name) # will print path of directories
   for name in files:    
       print os.path.join(path,name) # will print path of files

this will generate paths of all sub directories,files and files in sub directories

4

votes

Here's a short example of how os.walk() works along with some explanation using a few os functions.

First note that os.walk() returns three items, the root directory, a list of directories (dirs) immediately below the current root and a list of files found in those directories. The documentation will give you more information.

dirs will contain a list of directories just below root, and files will contain a list of all the files found in those directories. In the next iteration, each directory of those in the previous dirs list will take on the role of root in turn and the search will continue from there, going down a level only after the current level has been searched.

A code example: This will search for, count and print the names of .jpg and .gif files below the specified search directory (your root). It also makes use of the os.path.splitext() function to separate the base of the file from its extension and the os.path.join() function to give you the full name including path of the image files found.

import os

searchdir = r'C:\your_root_dir'  # your search starts in this directory (your root) 

count = 0
for root, dirs, files in os.walk(searchdir):
    for name in files:
        (base, ext) = os.path.splitext(name) # split base and extension
        if ext in ('.jpg', '.gif'):          # check the extension
            count += 1
            full_name = os.path.join(root, name) # create full path
            print(full_name)

print('\ntotal number of .jpg and .gif files found: %d' % count)

2

votes

os.walk works a little differently than above. Basically, it returns tuples of (path, directories, files). To see this, try the following:

import pprint
import os
pp=pprint.PrettyPrinter(indent=4)
for dir_tuple in os.walk("/root"):
    pp.pprint(dir_tuple)

...you'll see that each iteration of the loop will print a directory name, a list the names of any directories immediately within that directory, and another list of all files within that directory. os.walk will then enter each directory in the list of subdirectories and do the same thing, until all subdirectories of the original root have been traversed. It may help to learn a little about recursion to understand how this works.

python - Do I understand os.walk right?

5 Answers