April 2, 2007

Tail'ing in Python

or, finding last line of a huge file..

How do you find the last line of a 2 GB log file from within your program? You don't want to go through the whole file, right? Right. What you want to do is, you want to start reading from end until you find a newline character. Here is how I did it in Python:


def Tail(filepath, read_size=1024):
"""
This function returns the last line of a file.
Args:
filepath: path to file
read_size: data is read in chunks of this size (optional, default=1024)
Raises:
IOError if file cannot be processed.
"""
f = open(filepath, 'rU') # U is to open it with Universal newline support
offset = read_size
f.seek(0, 2)
file_size = f.tell()
while 1:
if file_size < offset:
offset = file_size
f.seek(-1*offset, 2)
read_str = f.read(offset)
# Remove newline at the end
if read_str[offset - 1] == '\n':
read_str = read_str[0:-1]
lines = read_str.split('\n')
if len(lines) > 1: # Got a line
return lines[len(lines) - 1]
if offset == file_size: # Reached the beginning
return read_str
offset += read_size
f.close()


(There will hardly be any reason to change read_size. I used it mainly for testing.)

It works quite similar to the way Unix 'tail -1' works. It can be easily be modified to return last 10 or 'n' lines, I believe. But, I haven't got the time and reason to try that yet :)

Remember, it's supposed to be called from within the python programs, not from command line (because Unix tail does that better ;-)).

I have done quite a bit of testing, so it must be safe to use.

cheers,
Manu

7 comments:

  1. Why to use this to tail last line...what wrong with tail -1 ?

    ReplyDelete
  2. nothing is wrong with tail -1. As I said, this code is supposed to be used inside a python program. Yes, I can call (fork) tail -1 from python program, but forking is expensive.

    ReplyDelete
  3. Not only that, but Windows does not appear to have an equivalent, so it is very nice to have.

    ReplyDelete
  4. Yes, very nice to have this on Windows machines... However, I get:

    if read_str[offset-1] == '\n':
    IndexError: string index out of range

    Because offset is somehow 1024 at this point of execution...

    But it works if i change:

    if read_str[offset-1] == '\n':

    to:

    if read_str[-1] == '\n':

    I'm still not able to read last lines correctly from my ffmpeg encoding output log files which are being updated very often.

    Thanks anyway :)

    ReplyDelete
  5. Okay, it was a newline issue with my ffmpeg log. Changing to universal newline support when opening file:

    f = open(filepath, 'r')

    changed to:

    f = open(filepath, 'U')

    and it seems to work perfectly.

    ReplyDelete
  6. Thanks for reverting ii! I sure didn't think about using this function on Windows while writing it :) This PEP explains the rationale behind universal line support - http://svn.python.org/projects/peps/trunk/pep-0278.txt.

    I'll change "r" to "rU" in the post.

    ReplyDelete
  7. f.close()

    will NEVER be executed. i believe it should be right before the return statments.

    ReplyDelete