1
votes

I was trying to parse a log file using AWK,

test.log

[12/12/18 11:54:54:321 PST] 0000077c WC_SERVER     < com.ibm.commerce.server.HttpRequestWrapper setAttribute(String,Object) Exit
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO < -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.createContextSPI(ActivityToken, ActivityData, String) Exit
                                 com.ibm.commerce.context.base.BaseContext : [bInitialize = false][bRecalibrate = false][inCallerId = null][inRunAsId = null][inStoreId = null][istrChannelId = null][bDirty = false][bRequestStarted = false][iOriginalSerializedString = null][iToken = null]
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.loadContextData(ActivityToken, String) Entry
                                 68884:false:false:0
                                 com.ibm.commerce.context.base.BaseContext
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextservice.commands.ContextDataSerValueCacheCmdImpl.myPerformExecute() Entry
                                 68884

the idea is, if a line starts with [, and it matches the pattern, then print out the line and also the following line, which does not start with [.

Expected result:

[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO < -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.createContextSPI(ActivityToken, ActivityData, String) Exit
                                 com.ibm.commerce.context.base.BaseContext : [bInitialize = false][bRecalibrate = false][inCallerId = null][inRunAsId = null][inStoreId = null][istrChannelId = null][bDirty = false][bRequestStarted = false][iOriginalSerializedString = null][iToken = null]
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.loadContextData(ActivityToken, String) Entry
                                 68884:false:false:0
                                 com.ibm.commerce.context.base.BaseContext
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextservice.commands.ContextDataSerValueCacheCmdImpl.myPerformExecute() Entry
                                 68884

AWK:

awk 'BEGIN{IGNORECASE = 1; flag = 0;}{ if($0 ~ /^\[/){if($0 ~ /WC_BUSINESSCO/){flag=1}else{flag = 0}; if(flag==1){print $0}}}' test.log

Current output:

[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO < -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.createContextSPI(ActivityToken, ActivityData, String) Exit
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.loadContextData(ActivityToken, String) Entry
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextservice.commands.ContextDataSerValueCacheCmdImpl.myPerformExecute() Entry

As you can see lines that don't start with [ aren't printed; after some debugging, it seems that AWK believes the issue lines is part or the pattern matched line. It's not printing out due to wrap issue, I guess.

How can I fix this?

5

5 Answers

3
votes

You're overdoing it.

awk '/^\[/{p=/WC_BUSINESSCO/}p' test.log
  • /^\[/ means perform the following action ({...}) if current record begins with a [,
  • p=/WC_BUSINESSCO/ sets p true if current record contains WC_BUSINESSCO and vice versa,
  • p at the end means print current record if p is true.
  • if the current line does not start with [, then the p value from the previous line remains.

For further information, see man awk.

For clarity, some additional whitespace:

awk '
    /^\[/ { p = /WC_BUSINESSCO/ }
    p
' test.log
2
votes

How does awk define a line?

Awk does not have any knowledge of what a line is. Awk knows the concept records and fields.

Files are split in records where consecutive records are split by the record separator RS. Each record is split in fields, where consecutive fields are split by the field separator FS.

By default, the record separator RS is set to be the <newline> character (\n) and thus each record is a line. The record separator has the following definition:

RS: The first character of the string value of RS shall be the input record separator; a <newline> by default. If RS contains more than one character, the results are unspecified. If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a <newline> shall always be a field separator, no matter what the value of FS is.

How can I now define a multi-line record?

For multi-line records where the start of a record cannot uniquely be identified by a single character, you might want to make use of gawk or any awk version where RS is can be multiple characters (or a regular expression). In case of the OP, you can define RS as \n\[:

awk 'BEGIN { RS="\n\[" }/WC_BUSINESSCO/ { print (NR==1 ? "" : "[") $0 }' file

If you do not have access to such a version of awk, and you have to stick to POSIX, you can do:

awk '/^\[/ && (rec ~ /WC_BUSINESSCO/) { printf rec; } # process record
     /^\[/ { rec="" }                                 # initialise record
     { rec = rec $0 ORS }                             # build record
     END { if (rec ~ /WC_BUSINESSCO/) printf rec }    # process last record
    ' file

This will match "WC_BUSINESSCO" in the full record, and not only the first line as is done in most solutions here. While for the OP, the first line might be enough. More general questions might have a problem with this. 


1
votes

You said: then print out the line and also the following line.

Try this instead:

awk '/^\[.*WC_BUSINESSCO/{print;getline;print}' test.log

The flow is quite simple when the pattern matches print the line, get the next one and print again.

To get all the lines after the one that starts with [:

awk '/^\[/{i=0}/WC_BUSINESSCO/{i=1}i' test.log

Check this.

1
votes

with GNU-awk you can define the record separator as you specified

$ awk -v RS='(^|\n)\\[' '/WC_BUSINESSCO/{print RT $0}' file

with the pattern match print the record (possibly multi-line), but with the record separator prefixed to the record.

with other awks the workaround

$ awk '/^\[/{if(/WC_BUSINESSCO/){print; p=1} else p=0} p&&!/^\[/' file
0
votes

If you are considering Perl, then this is a generic solution based on your requirements. Note that it doesn't hard-code any text (e.g WC_BUSINESSCO ) from the file for the solution.

/tmp> cat test.log
[12/12/18 11:54:54:321 PST] 0000077c WC_SERVER     < com.ibm.commerce.server.HttpRequestWrapper setAttribute(String,Object) Exit
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO < -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.createContextSPI(ActivityToken, ActivityData, String) Exit
                                 com.ibm.commerce.context.base.BaseContext : [bInitialize = false][bRecalibrate = false][inCallerId = null][inRunAsId = null][inStoreId = null][istrChannelId = null][bDirty = false][bRequestStarted = false][iOriginalSerializedString = null][iToken = null]
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.loadContextData(ActivityToken, String) Entry
                                 68884:false:false:0
                                 com.ibm.commerce.context.base.BaseContext
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextservice.commands.ContextDataSerValueCacheCmdImpl.myPerformExecute() Entry
                                 68884
/tmp> perl -ne ' print "$t$p" if $x and /^\[/ ;if(!/^\[/) { $x++;$t.=$p} if(/^\[/) { $x=0;$t=""} $p=$_;END { print "$t$p" if $x }' test.log
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO < -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.createContextSPI(ActivityToken, ActivityData, String) Exit
                                 com.ibm.commerce.context.base.BaseContext : [bInitialize = false][bRecalibrate = false][inCallerId = null][inRunAsId = null][inStoreId = null][istrChannelId = null][bDirty = false][bRequestStarted = false][iOriginalSerializedString = null][iToken = null]
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextserviceimpl.BusinessContextServiceImpl.loadContextData(ActivityToken, String) Entry
                                 68884:false:false:0
                                 com.ibm.commerce.context.base.BaseContext
[12/12/18 11:54:54:328 PST] 0000077c WC_BUSINESSCO > -1112ef1b:16732963f15:-7fd4 com.ibm.commerce.component.contextservice.commands.ContextDataSerValueCacheCmdImpl.myPerformExecute() Entry
                                 68884
/tmp>