3
votes

I have this python crawler output

[+] Site to crawl: http://www.example.com
[+] Start time: 2020-05-24 07:21:27.169033
[+] Output file: www.example.com.crawler

[+] Crawling
   [-] http://www.example.com
   [-] http://www.example.com/
   [-] http://www.example.com/icons/ubuntu-logo.png
   [-] http://www.example.com/manual
    [i] 404 Not Found
[+] Total urls crawled: 4

[+] Directories found:
   [-] http://www.example.com/icons/
[+] Total directories: 1

[+] Directory with indexing

I want to cut the lines between "Crawling" & "Total urls crawled" using awk or any other tool, so basically i wanna use variables to assign the NR to the first keyword "Crawling", and a second variable assigned to it the NR value of the second limiter "Total urls crawled", and then cut the range between the two, i tried something like this:

awk 'NR>$(Crawling) && NR<$(urls)' file.txt

but nothing really worked, the best i got is a cut from the Crawling+1 line to the end of the file which isn't helpfull really, so how to do it and how to cut a range of lines with awk with variables!

awk

2

2 Answers

5
votes

If I got your requirement correctly you want to put shell variables to awk code and search strings then try following.

awk -v crawl="Crawling" -v url="Total urls crawled" '
$0 ~ url{
  found=""
  next
}
$0 ~ crawl{
  found=1
  next
}
found
'  Input_file

Explanation: Adding detailed explanation for above.

awk -v crawl="Crawling" -v url="Total urls crawled" '   ##Starting awk program and setting crawl and url values of variables here.
$0 ~ url{                      ##Checking if line is matched to url variable then do following.
  found=""                     ##Nullify the variable found here.
  next                         ##next will skip further statements from here.
}
$0 ~ crawl{                    ##Checking if line is matched to crawl variable then do following.
  found=1                      ##Setting found value to 1 here.
  next                         ##next will skip further statements from here.
}
found                          ##Checking condition if found is SET(NOT NULL) then print current line.
'  Input_file                  ##Mentioning Input_file name here.
1
votes

The clause "...or any other tool" prompts me to point out that a scripting language could be used in command-line mode for this. Here's how it could be done using Ruby, where 't' is the name of the file that contains the text from which the specified lines are to be extracted. The following would be entered in the shell.

ruby -W0 -e 'puts STDIN.readlines.select { |line| true if line.match?(/\bCrawling\b/)..line.match?(/\bTotal urls crawled\b/) }[1..-2]' < t

displays the following:

["   [-] http://www.example.com",
 "   [-] http://www.example.com/",
 "   [-] http://www.example.com/icons/ubuntu-logo.png",
 "   [-] http://www.example.com/manual",
 "    [i] 404 Not Found"] 

The following operations are performed.

  • STDIN.readlines and < t reads the lines of t into an array
  • select selects the lines for which its block calculation returns true
  • [1..-2] extracts all but the first and last of the selected lines

select's block calculation,

true if line.match?(/\bCrawling\b/)..line.match?(/\bTotal urls crawled\b/)

employs the flip-flop operator. The block returns nil (treated as false by Ruby) until a line that matches /\bCrawling\b is read, namely, "[+] Crawling". The block then returns true, and continues to return true until and it encounters the line matching /\bTotal urls crawled\b, namely, "[+] Total urls crawled: 4". The block returns true for that line as well, but returns false for each subsequent line until and if it encounters another line that matches /\bCrawling\b, in which case the process repeats. Hence, "flip-flop".

"-W0" in the command line suppresses warning messages. Without it one may see the warning, "flip-flop is deprecated" (depending on the version of Ruby being used). After a decision was made to deprecate the (rarely-used) flip-flop operator, Rubyists took to the streets with pitchforks and torches in protest. The Ruby monks saw the error of their ways and reversed their decision.