I have to open several thousand files, but only read the first 3 lines.
Currently, I am doing this:
def test_readline(filename):
fid = open(filename, 'rb')
lines = [fid.readline() for i in range(3)]
Which yields the result:
The slowest run took 10.20 times longer than the fastest. This could mean that an intermediate result is being cached. 10000 loops, best of 3: 59.2 µs per loop
An alternate solution would be to convert the fid to a list:
def test_list(filename):
fid = open(filename, 'rb')
lines = list(fid)
%timeit test_list(MYFILE)
The slowest run took 4.92 times longer than the fastest. This could mean that an intermediate result is being cached. 10000 loops, best of 3: 374 µs per loop
Yikes!! Is there a faster way to only read the first 3 lines of these files, or is readline() the best? Can you respond with alternatives and timings please?
But at the end-of-the-day I have to open thousands of individual files and they will not be cached. Thus, does it even matter (looks like it does)?
(603µs uncached method readline vs. 1840µs list method)
Additionally, here is the readlines() method:
def test_readlines(filename):
fid = open(filename, 'rb')
lines = fid.readlines()
return lines
The slowest run took 7.17 times longer than the fastest. This could mean that an intermediate result is being cached. 10000 loops, best of 3: 334 µs per loop
.readlines()
also accepts a parameter with a maximum number of bytes or characters to read. Little weird for most situations though. – Ry-