17
votes

I have a urllib2 caching module, which sporadically crashes because of the following code:

if not os.path.exists(self.cache_location):
    os.mkdir(self.cache_location)

The problem is, by the time the second line is being executed, the folder may exist, and will error:

  File ".../cache.py", line 103, in __init__
    os.mkdir(self.cache_location)
OSError: [Errno 17] File exists: '/tmp/examplecachedir/'

This is because the script is simultaneously launched numerous times, by third-party code I have no control over.

The code (before I attempted to fix the bug) can be found here, on github

I can't use the tempfile.mkstemp, as it solves the race condition by using a randomly named directory (tempfile.py source here), which would defeat the purpose of the cache.

I don't want to simply discard the error, as the same error Errno 17 error is raised if the folder name exists as a file (a different error), for example:

$ touch blah
$ python
>>> import os
>>> os.mkdir("blah")
Traceback (most recent call last):
  File "", line 1, in 
OSError: [Errno 17] File exists: 'blah'
>>>

I cannot using threading.RLock as the code is called from multiple processes.

So, I tried writing a simple file-based lock (that version can be found here), but this has a problem: it creates the lockfile one level up, so /tmp/example.lock for /tmp/example/, which breaks if you use /tmp/ as a cache dir (as it tries to make /tmp.lock)..

In short, I need to cache urllib2 responses to disc. To do this, I need to access a known directory (creating it, if required), in a multiprocess safe way. It needs to work on OS X, Linux and Windows.

Thoughts? The only alternative solution I can think of is to rewrite the cache module using SQLite3 storage, rather than files.

5

5 Answers

3
votes

In Python 3.x, you can use os.makedirs(path, exist_ok=True), which will not raise any exception if such directory exists. It will raise FileExistsError: [Errno 17] if a file exists with the same name as the requested directory (path).

Verify it with:

import os

parent = os.path.dirname(__file__)

target = os.path.join(parent, 'target')

os.makedirs(target, exist_ok=True)
os.makedirs(target, exist_ok=True)

os.rmdir(target)

with open(target, 'w'):
    pass

os.makedirs(target, exist_ok=True)
11
votes

Instead of

if not os.path.exists(self.cache_location):
    os.mkdir(self.cache_location)

you could do

try:
    os.makedirs(self.cache_location)
except OSError:
    pass

As you would end up with the same functionality.

DISCLAIMER: I don't know how Pythonic this might be.


Using SQLite3, might be a bit of overkill, but would add a lot of functionality and flexibility to your use case.

If you have to do a lot of "selecting", concurrent inserting and filtering, it's a great idea to use SQLite3, as it wont add too much complexity over simple files (it could be argued that it removes complexity).


Rereading your question (and comments) I can better understand your problem.

What is the possibility that a file could create the same race condition?

If it is small enough, then I'd do something like:

if not os.path.isfile(self.cache_location):
    try:
        os.makedirs(self.cache_location)
    except OSError:
        pass

Also, reading your code, I'd change

else:
    # Our target dir is already a file, or different error,
    # relay the error!
    raise OSError(e)

to

else:
    # Our target dir is already a file, or different error,
    # relay the error!
    raise

as it's really what you want, Python to reraise the exact same exception (just nitpicking).


One more thing, may be this could be of use for you (Unix-like only).

10
votes

The code I ended up with was:

import os
import errno

folder_location = "/tmp/example_dir"

try:
    os.mkdir(folder_location)
except OSError as e:
    if e.errno == errno.EEXIST and os.path.isdir(folder_location):
        # File exists, and it's a directory,
        # another process beat us to creating this dir, that's OK.
        pass
    else:
        # Our target dir exists as a file, or different error,
        # reraise the error!
        raise
2
votes

Could you catch the exception and then test whether the file exists as a directory or not?

1
votes

When you have race conditions often EAFP(easier to ask forgiveness than permission) works better that LBYL(look before you leap)

Error checking strategies