I'm using a for
loop to read a file, but I only want to read specific lines, say line #26
and #30
. Is there any built-in feature to achieve this?
28 Answers
If the file to read is big, and you don't want to read the whole file in memory at once:
fp = open("file")
for i, line in enumerate(fp):
if i == 25:
# 26th line
elif i == 29:
# 30th line
elif i > 29:
break
fp.close()
Note that i == n-1
for the n
th line.
In Python 2.6 or later:
with open("file") as fp:
for i, line in enumerate(fp):
if i == 25:
# 26th line
elif i == 29:
# 30th line
elif i > 29:
break
The quick answer:
f=open('filename')
lines=f.readlines()
print lines[25]
print lines[29]
or:
lines=[25, 29]
i=0
f=open('filename')
for line in f:
if i in lines:
print i
i+=1
There is a more elegant solution for extracting many lines: linecache (courtesy of "python: how to jump to a particular line in a huge text file?", a previous stackoverflow.com question).
Quoting the python documentation linked above:
>>> import linecache
>>> linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\n'
Change the 4
to your desired line number, and you're on. Note that 4 would bring the fifth line as the count is zero-based.
If the file might be very large, and cause problems when read into memory, it might be a good idea to take @Alok's advice and use enumerate().
To Conclude:
- Use
fileobject.readlines()
orfor line in fileobject
as a quick solution for small files. - Use
linecache
for a more elegant solution, which will be quite fast for reading many files, possible repeatedly. - Take @Alok's advice and use
enumerate()
for files which could be very large, and won't fit into memory. Note that using this method might slow because the file is read sequentially.
A fast and compact approach could be:
def picklines(thefile, whatlines):
return [x for i, x in enumerate(thefile) if i in whatlines]
this accepts any open file-like object thefile
(leaving up to the caller whether it should be opened from a disk file, or via e.g a socket, or other file-like stream) and a set of zero-based line indices whatlines
, and returns a list, with low memory footprint and reasonable speed. If the number of lines to be returned is huge, you might prefer a generator:
def yieldlines(thefile, whatlines):
return (x for i, x in enumerate(thefile) if i in whatlines)
which is basically only good for looping upon -- note that the only difference comes from using rounded rather than square parentheses in the return
statement, making a list comprehension and a generator expression respectively.
Further note that despite the mention of "lines" and "file" these functions are much, much more general -- they'll work on any iterable, be it an open file or any other, returning a list (or generator) of items based on their progressive item-numbers. So, I'd suggest using more appropriately general names;-).
For the sake of completeness, here is one more option.
Let's start with a definition from python docs:
slice An object usually containing a portion of a sequence. A slice is created using the subscript notation, [] with colons between numbers when several are given, such as in variable_name[1:3:5]. The bracket (subscript) notation uses slice objects internally (or in older versions, __getslice__() and __setslice__()).
Though the slice notation is not directly applicable to iterators in general, the itertools
package contains a replacement function:
from itertools import islice
# print the 100th line
with open('the_file') as lines:
for line in islice(lines, 99, 100):
print line
# print each third line until 100
with open('the_file') as lines:
for line in islice(lines, 0, 100, 3):
print line
The additional advantage of the function is that it does not read the iterator until the end. So you can do more complex things:
with open('the_file') as lines:
# print the first 100 lines
for line in islice(lines, 100):
print line
# then skip the next 5
for line in islice(lines, 5):
pass
# print the rest
for line in lines:
print line
And to answer the original question:
# how to read lines #26 and #30
In [365]: list(islice(xrange(1,100), 25, 30, 4))
Out[365]: [26, 30]
Reading files is incredible fast. Reading a 100MB file takes less than 0.1 seconds (see my article Reading and Writing Files with Python). Hence you should read it completely and then work with the single lines.
What most answer here do is not wrong, but bad style. Opening files should always be done with with
as it makes sure that the file is closed again.
So you should do it like this:
with open("path/to/file.txt") as f:
lines = f.readlines()
print(lines[26]) # or whatever you want to do with this line
print(lines[30]) # or whatever you want to do with this line
Huge files
If you happen to have a huge file and memory consumption is a concern, you can process it line by line:
with open("path/to/file.txt") as f:
for i, line in enumerate(f):
pass # process line i
Some of these are lovely, but it can be done much more simply:
start = 0 # some starting index
end = 5000 # some ending index
filename = 'test.txt' # some file we want to use
with open(filename) as fh:
data = fin.readlines()[start:end]
print(data)
That will use simply list slicing, it loads the whole file, but most systems will minimise memory usage appropriately, it's faster than most of the methods given above, and works on my 10G+ data files. Good luck!
You can do a seek() call which positions your read head to a specified byte within the file. This won't help you unless you know exactly how many bytes (characters) are written in the file before the line you want to read. Perhaps your file is strictly formatted (each line is X number of bytes?) or, you could count the number of characters yourself (remember to include invisible characters like line breaks) if you really want the speed boost.
Otherwise, you do have to read every line prior to the line you desire, as per one of the many solutions already proposed here.
def getitems(iterable, items):
items = list(items) # get a list from any iterable and make our own copy
# since we modify it
if items:
items.sort()
for n, v in enumerate(iterable):
if n == items[0]:
yield v
items.pop(0)
if not items:
break
print list(getitems(open("/usr/share/dict/words"), [25, 29]))
# ['Abelson\n', 'Abernathy\n']
# note that index 25 is the 26th item
If you don't mind importing then fileinput does exactly what you need (this is you can read the line number of the current line)
I prefer this approach because it's more general-purpose, i.e. you can use it on a file, on the result of f.readlines()
, on a StringIO
object, whatever:
def read_specific_lines(file, lines_to_read):
"""file is any iterable; lines_to_read is an iterable containing int values"""
lines = set(lines_to_read)
last = max(lines)
for n, line in enumerate(file):
if n + 1 in lines:
yield line
if n + 1 > last:
return
>>> with open(r'c:\temp\words.txt') as f:
[s for s in read_specific_lines(f, [1, 2, 3, 1000])]
['A\n', 'a\n', 'aa\n', 'accordant\n']
Here's my little 2 cents, for what it's worth ;)
def indexLines(filename, lines=[2,4,6,8,10,12,3,5,7,1]):
fp = open(filename, "r")
src = fp.readlines()
data = [(index, line) for index, line in enumerate(src) if index in lines]
fp.close()
return data
# Usage below
filename = "C:\\Your\\Path\\And\\Filename.txt"
for line in indexLines(filename): # using default list, specify your own list of lines otherwise
print "Line: %s\nData: %s\n" % (line[0], line[1])
Fairly quick and to the point.
To print certain lines in a text file. Create a "lines2print" list and then just print when the enumeration is "in" the lines2print list. To get rid of extra '\n' use line.strip() or line.strip('\n'). I just like "list comprehension" and try to use when I can. I like the "with" method to read text files in order to prevent leaving a file open for any reason.
lines2print = [26,30] # can be a big list and order doesn't matter.
with open("filepath", 'r') as fp:
[print(x.strip()) for ei,x in enumerate(fp) if ei in lines2print]
or if list is small just type in list as a list into the comprehension.
with open("filepath", 'r') as fp:
[print(x.strip()) for ei,x in enumerate(fp) if ei in [26,30]]
To print desired line. To print line above/below required line.
def dline(file,no,add_sub=0):
tf=open(file)
for sno,line in enumerate(tf):
if sno==no-1+add_sub:
print(line)
tf.close()
execute---->dline("D:\dummy.txt",6) i.e dline("file path", line_number, if you want upper line of the searched line give 1 for lower -1 this is optional default value will be taken 0)