0
votes

I want to parse out data out of a log file which consist of JSON sting and I wonder if there's a way for me to use a bash function to perform any custom parsing instead of overloading jq command.

Command:

tail errors.log --follow | jq --raw-output '. | [.server_name, .server_port, .request_file] | @tsv' 

Outputs:

8.8.8.8     80     /var/www/domain.com/www/public

I want to parse 3rd column to cut the string to exclude /var/www/domain.com part where /var/www/domain.com is the document root, and /var/www/domain.com/subdomain/public is the public html section of the site. Therefore I would like to leave my output as /subdomain/public (or from the example /www/public).

I wonder if I can somehow inject a bash function to parse .request_file column? Or how would I do that using jq?

I'm having issues piping out the output of any part of this command that would allow me to do any sort of string manipulation.

1
Only domain.com exactly, or do you want to trim the first three directories no matter what they are? Should the resulting 3rd column be www/public or /www/public? I'm making some assumptions in my answer, but a better question would provide an explicit example of the desired output, not just the current output.Charles Duffy
/www/public would be the desired output.HelpNeeder
BTW, options (like --follow) should always be before arguments (like errors.log); GNU tools allow it to be the other way around, but the POSIX utility syntax guidelines are explicit that only the options-first ordering is guaranteed to be supported.Charles Duffy
hmm. This might be confusing so the desire output is the /subdomain/public which is /www/domain.com if the source was www.example.com.HelpNeeder
Please edit the question to contain an explicit counterexample (with both present output and desired output) for which you're concerned the existing answer may not behave as desired.Charles Duffy

1 Answers

4
votes

Use a BashFAQ #1 while read loop to iterate over the lines, and a BashFAQ #100 parameter expansion to perform the desired modifications:

tail -f -- errors.log \
  | jq --raw-output --unbuffered \
       '[.server_name, .server_port, .request_file] | @tsv' \
  | while IFS=$'\t' read -r server_name server_port request_file; do
      printf '%s\t%s\t%s\n' "$server_name" "$server_port" "/${request_file#/var/www/*/}"
    done

Note the use of --unbuffered, to force jq to flush its output lines immediately rather than buffering them. This has a performance penalty (so it's not default), but it ensures that you get output immediately when reading from a potentially-slow input source.


That said, it's also easy to remove a prefix in jq, so there's no particular reason to do the above:

tail -f -- errors.log | jq -r '
  def withoutPrefix: sub("^([/][^/]+){3}"; "");
  [.server_name, .server_port, (.request_file | withoutPrefix)] | @tsv'