Eric Lindsey's answer does not work because UTF-8 files can have more than one byte per character. Worse, for those of us who speak English as a first language and work with English only files, it might work just long enough to get out into production code and really break things.
The following answer is based on undefined behavior
... but it does work for now for UTF-8 in Python 3.7.
To seek backwards through a file in text mode, you can do so as long as you correctly handle the UnicodeDecodeError
caused by seeking to a byte which is not the start of a UTF-8 Character. Since we are seeking backwards we can simply seek back an extra byte until we find the start of the character.
The result of f.tell()
is still the byte position in the file for UTF-8 files, at-least for now. So an f.seek()
to an invalid offset will raise a UnicodeDecodeError when you subsequently f.read()
and this can be corrected by f.seek()
again to a different offset. At least this works for now.
Eg, seeking to the beginning of a line (just after the \n
):
pos = f.tell() - 1
if pos < 0:
pos = 0
f.seek(pos, os.SEEK_SET)
while pos > 0:
try:
character = f.read(1)
if character == '\n':
break
except UnicodeDecodeError:
pass
pos -= 1
f.seek(pos, os.SEEK_SET)