I import apache logs into influxdb with telegraf and logparser plugin
I want to filter out all the logs from bots, so I setup a custom pattern with a regex that only match user-agent that don't contain the words "bot" and "crawl" :
NOBOT ((?!bot|crawl).)*
CUSTOM_LOG_FORMAT %{CLIENT:client_ip} %{NOTSPACE:ident} %{NOTSPACE:auth} \[%{HTTPDATE:ts:ts-httpd}\] "(?:%{WORD:verb:tag} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version:float})?|%{DATA})" %{NUMBER:resp_code:tag} (?:%{NUMBER:resp_bytes:int}|-) %{QS:referrer} "%{NOBOT:agent}"
but it doesnt work, zero metrics are being imported into influxdb
the regex seems ok and it works fine when I test it here : http://grokconstructor.appspot.com/do/match
Just to be sure I tried a simpler regex :
BOT .*?bot.*?
CUSTOM_LOG_FORMAT %{CLIENT:client_ip} %{NOTSPACE:ident} %{NOTSPACE:auth} \[%{HTTPDATE:ts:ts-httpd}\] "(?:%{WORD:verb:tag} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version:float})?|%{DATA})" %{NUMBER:resp_code:tag} (?:%{NUMBER:resp_bytes:int}|-) %{QS:referrer} "%{BOT:agent}"
and it works, telegraf only import logs from bots but I want the opposite, I don't see what's wrong with ((?!bot|crawl).)*