0
votes

I am trying to regex this single line user_agent field.

user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/437.38 (KHTML, like Gecko) Chrome/49.0.3477.100 Safari/437.38"

cat myfile | grep -oP '(user_agent=[^ ]*)' | awk {'print $1'}

The command above returns

“user_agent="Mozilla/5.0 “

only. However I need a whole text

"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/437.38 (KHTML, like Gecko) Chrome/49.0.3477.100 Safari/437.38"

to be matched.

Please help modify the regex pattern I used.

1
Hi, Thanks for your response. I tried cat myfile | grep -oP 'user_agent=(.*)$' | awk {'print $1'} , it still prints user_agent="Mozilla/5.0. but I need a whole lines to print. anything i need to replace.coolent
Do you have anything else in the same row? If yes, share that whole row contents for reference. For now the star in your regex is not looking for complete string and awk also prints only first element delimited by [SPACE]. Try this regex 'user_agent=[^ ].*)' and remove the awk print.Amit Bhardwaj
Thanks. after removed awk I get the whole lines but including other fields. please find the whole row contents . I am trying to get user_agent field only. user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/437.38 (KHTML, like Gecko) GitHubDesktop/1.4.1 Chrome/49.0.3477.100 Electron/2.0.9 Safari/437.38" accept="application/vnd.github.v3+json, application/json" language=en-US status=201coolent
Try this regex now 'user_agent=\".*\")(?=\saccept)'Amit Bhardwaj

1 Answers

1
votes

The problem you are facing is 2 fold.

  1. Your regex * is only fetching the first letter only since there is a space after that and that is not captured.
  2. In addition, even if you fix first part awk will again print only first content delimited by space.

So you need to let go of the awk print and use .* in place of * with a positive lookahead.

Check the following:

cat myfile | grep -oP '(user_agent=\".*\")(?=\saccept)'

Here, \".*\" is searching for everything that is within double quotes (?=\saccept)is a positive lookahead statement which stops your search as soon as it finds a space followed by accept.