python - Is there a need to close files that have no reference to them?

Question

As a complete beginner to programming, I am trying to understand the basic concepts of opening and closing files. One exercise I am doing is creating a script that allows me to copy the contents from one file to another.

in_file = open(from_file)
indata = in_file.read()

out_file = open(to_file, 'w')
out_file.write(indata)

out_file.close()
in_file.close()

I have tried to shorten this code and came up with this:

indata = open(from_file).read()
open(to_file, 'w').write(indata)

This works and looks a bit more efficient to me. However, this is also where I get confused. I think I left out the references to the opened files; there was no need for the in_file and out_file variables. However, does this leave me with two files that are open, but have nothing referring to them? How do I close these, or is there no need to?

Any help that sheds some light on this topic is much appreciated.

Side-note: What you're doing is subject to memory issues for large input files; I'd suggest taking a look at shutil.copyfileobj and related methods for doing this (which explicitly copy block by block, so peak memory consumption is fixed, not dependent on the size of the input file). — ShadowRanger
There's a good discussion of this at stackoverflow.com/questions/7395542/… — snakecharmerb
What do you mean by "looks a bit more efficient"? If you mean the source code is shorter, well, that's certainly true. But it's not generally considered a priority to make the code as short as possible in Python. Much more important is readability. If you mean you think it will execute more efficiently, I'd say it probably won't make enough difference to matter, though to be sure you always want to measure. Why it shouldn't make much difference is that creating and assigning variables is cheap. In Python, all assignment amounts to setting a pointer. — John Y
@JohnY - That is an excellent question. I think I had the assumption that "shorter" or "involving less intermediate steps" or "less variables" is good. You make a good point that it is actually readability that is more important (especially in a script with more code than my simple exercise). When you say assigning variables is cheap, I assume that means it will not cost much memory? — Roy Crielaard
@RoyCrielaard indeed - a "variable" is just a reference to data that must already exist in memory, it is the size of a memory pointer. Assignment is cheap both in terms of size and speed. Be aware that even in your first example you have file handle leak issues - if the writing crashes, for example, nothing will be closed. — Boris the Spider

alexis alexis · Accepted Answer · 2016-03-17T14:16:35

You asked about the "basic concepts", so let's take it from the top: When you open a file, your program gains access to a system resource, that is, to something outside the program's own memory space. This is basically a bit of magic provided by the operating system (a system call, in Unix terminology). Hidden inside the file object is a reference to a "file descriptor", the actual OS resource associated with the open file. Closing the file tells the system to release this resource.

As an OS resource, the number of files a process can keep open is limited: Long ago the per-process limit was about 20 on Unix. Right now my OS X box imposes a limit of 256 open files (though this is an imposed limit, and can be raised). Other systems might set limits of a few thousand, or in the tens of thousands (per user, not per process in this case). When your program ends, all resources are automatically released. So if your program opens a few files, does something with them and exits, you can be sloppy and you'll never know the difference. But if your program will be opening thousands of files, you'll do well to release open files to avoid exceeding OS limits.

There's another benefit to closing files before your process exits: If you opened a file for writing, closing it will first "flush its output buffer". This means that i/o libraries optimize disk use by collecting ("buffering") what you write out, and saving it to disk in batches. If you write text to a file and immediately try to reopen and read it without first closing the output handle, you'll find that not everything has been written out. Also, if your program is closed too abruptly (with a signal, or occasionally even through normal exit), the output might never be flushed.

There's already plenty of other answers on how to release files, so here's just a brief list of the approaches:

Explicitly with close(). (Note for python newbies: Don't forget the parens! My students like to write in_file.close, which does nothing.)
Recommended: Implicitly, by opening files with the with statement. The close() method will be called when the end of the with block is reached, even in the event of abnormal termination (from an exception).
```
with open("data.txt") as in_file:
    data = in_file.read()
```
Implicitly by the reference manager or garbage collector, if your python engine implements it. This is not recommended since it's not entirely portable; see the other answers for details. That's why the with statement was added to python.
Implicitly, when your program ends. If a file is open for output, this may run a risk of the program exiting before everything has been flushed to disk.

python - Is there a need to close files that have no reference to them?

6 Answers