I have a urllib2 caching module, which sporadically crashes because of the following code:
if not os.path.exists(self.cache_location):
os.mkdir(self.cache_location)
The problem is, by the time the second line is being executed, the folder may exist, and will error:
File ".../cache.py", line 103, in __init__ os.mkdir(self.cache_location) OSError: [Errno 17] File exists: '/tmp/examplecachedir/'
This is because the script is simultaneously launched numerous times, by third-party code I have no control over.
The code (before I attempted to fix the bug) can be found here, on github
I can't use the tempfile.mkstemp, as it solves the race condition by using a randomly named directory (tempfile.py source here), which would defeat the purpose of the cache.
I don't want to simply discard the error, as the same error Errno 17 error is raised if the folder name exists as a file (a different error), for example:
$ touch blah $ python >>> import os >>> os.mkdir("blah") Traceback (most recent call last): File "", line 1, in OSError: [Errno 17] File exists: 'blah' >>>
I cannot using threading.RLock
as the code is called from multiple processes.
So, I tried writing a simple file-based lock (that version can be found here), but this has a problem: it creates the lockfile one level up, so /tmp/example.lock
for /tmp/example/
, which breaks if you use /tmp/
as a cache dir (as it tries to make /tmp.lock
)..
In short, I need to cache urllib2
responses to disc. To do this, I need to access a known directory (creating it, if required), in a multiprocess safe way. It needs to work on OS X, Linux and Windows.
Thoughts? The only alternative solution I can think of is to rewrite the cache module using SQLite3 storage, rather than files.