0
votes

I am working on an operating system with limited utilities. Utilities like tail, head, and tac are not available! sed, awk, and Grep are available, but grep does not have the -m option for stopping after the first find. see the list of available options here.

My goal is to search for a line containing a string in a potentially large log.txt file, maybe ~100Mb from the end in reverse and print it. The trick is the operation has to be fast: no more than 3-4sec tops.

I tried using sed to reverse the contents of the file into another and then using awk and grep in a loop to search chunks of 10,000 lines, but the sed reverse was way too slow for anything beyond a few Mb

Something I tried.

self.sed_line_search = 10001
self.sed_cmd = "sed -e :a -e '$q;N;"+str(self.sed_line_search)+",$D;ba'"
self.awk_cmd = "awk '/Version/{print}'"   
self.Command = self.sed_cmd + " " + LOGFILE_PATH + " | " + self.awk_cmd + "\n"
tries, max_tries = 1,5
while tries < max_tries:
    ret = execute(self.Command)
    if not ret:
        self.sed_line_search += 10000
        self.sed_cmd = "sed -e :a -e '$q;N;"+str(self.sed_line_search)+",$D;ba'"
        self.Command = self.sed_cmd + " " + LOGFILE_PATH + " | " + self.awk_cmd + "\n"
        tries += 1

With out knowing how to stop at the fist match without the grep -m 1 option, this slightly achieves that goal by only looking at a few thousand lines at a time. But, It does not search in reverse.

3
the first match from a reverse file search -- is that a long way of saying the last match in a file?Shawn
Yeah, classic case of describing how you think something needs to be done rather than describing what it is that needs to be done.Ed Morton
I would like at using split to divide the file into smaller pieces and then use rev plus gawk. Also, according to the GNX docs, tail is available, so I would definitely use that.Marco
@Shawn: the last match in the file suggests to me that I have to search the entire file returning the last match. performing the operation in reverse returning the first match just seems more efficient.cogito
@Marco: omg! tail is available. ok I will try a out some solutions with tail. In the mean time. I know the first issue I will run into here is that I want to return the last match in the file. using tail I can specify something like 'the last 10000 lines and grep for the string, but I want only the last one.cogito

3 Answers

1
votes

Not sure if it this you want. It search for all line with test and prints them in reveres.

cat file
dfsdf
test1
fsdfsdf
fdg
sfdgs
fdgsdf
gsfdg
sfdte
test2
dgsfdgsdf
fdgsfdg
sdfgs
df
test3
sfdgsfdg

awk '/test/ {a[++x]=$0} END {for (i=x;i>=1;i--) print a[i]}' file
test3
test2
test1
1
votes

This might work for you (GNU sed):

sed -n '/regexp/h;$!b;x;p' file

Copy the line that matches regexp to the hold space and at the end of the file print the hold space.

0
votes

IMHO the fastest you could do would be:

grep 'regexp' | sed -n '$p'