Multiple parallel processes writing to the same file on SGE cluster

Question

I am currently working on an SGE cluster and have code which submits many jobs, written in python, in parallel.

The output at the end of my code is a set of files containing numerical data. Each python job performs some calculation and then outputs to each file in turn. To output to a file, my code reads in the data in the file, adds what it has computed to the data, and then outputs back to the file.

My problem is this; as all of the jobs are run in parallel, and all of the jobs contribute to each of the output files; my jobs are conflicting with one another. I am often getting errors concerning incompatible file sizes and such. The cause I believe is that sometimes two jobs will try and read in the file around the same time and conflict.

My question is this: When running (potentially many) multiple jobs in parallel that are each contributing to the same file multiple times, is there a good practice way of ensuring that they don't try to write to the file concurrently? Are there any pythonic or SGE solutions to this problem?

My niave idea was to have a txt file, which contains a 1 or 0 indicating whether a file is currently being accessed, and that a job will only write to a file when the value is set to 0, and will change the value to 1 whilst it is outputting. Is this a bad practice?/dumb idea?

Simon S. Simon S. · Accepted Answer · 2020-04-15T15:02:04

A common practice to ensure safety across multiple threads (that is tasks running in parallell in the same process) is to use a mutex. Since this will be ran in multiple processes the lock needs to be acquired outside of the process.

What you describe as a solution is a slimmed down version of a locking mechanism. You could either store a value or a file e.g. lock file1.txt by creating file1.txt.lock and ensure no such file exists before writing to file1.txt.

However, since this is a common problem there are already several file locking mechanisms available, including from within python.

According to this post you can take a file lock like this:

from filelock import FileLock

with FileLock("myfile.txt.lock"):
    print("Lock acquired.")
    with open("myfile.txt"):
        # work with the file as it is now locked

Please have a look at the comments and other answers to find alternative solutions for how to best acquire a lock for the files in your specific need.

Multiple parallel processes writing to the same file on SGE cluster

2 Answers