0
votes

I am using the Splunk logging driver to send logs to splunk with the following command line: docker run -d -p 443:8443 --log-driver=splunk --log-opt splunk-token=REDACTED --log-opt splunk-url=https://myloghost.example.net:8088 --log-opt splunk-sourcetype=idp --log-opt splunk-index=auth_idp --log-opt splunk-insecureskipverify=1 --log-opt splunk-format=raw --log-opt splunk-gzip=true --name shib --restart always --health-cmd 'curl -k -f https://127.0.0.1:8443/idp/status || exit 1' --health-interval=2m --health-timeout=30s

The container runs normally, and logs flow into Splunk. All is good. This is in a testing environment, so it is not always in use, but the container is left running. Sometimes, when I start using the service the container provides, nothing is logged to Splunk immediately. If I wait 10-15 minutes, the logs eventually show up with the correct time stamps, etc.

I've noticed on the docker host that netstat -tpn | grep -e 8088 gives me output similar to this:

Active Internet connections (w/o servers)
Proto Recv-Q    Send-Q  Local Address           Foreign Address         State       PID/Program name    
tcp        0    947     xxx.xxx.x.xxx:49010     xxx.xxx.x.xx:8088       ESTABLISHED 12682/dockerd-curre   

On the Splunk host, the same command shows zeroes in the Recv-Q and Send-Q columns. The Splunk Distributed Management Console doesn't show any events received during the lag time. On the Docker host, there is a message in /var/log/messages from Docker that happens at the same time the logs are finally sent to Splunk:

Jul  6 13:14:19 idpdock0-0 dockerd-current: time="2018-07-06T13:14:19.428396282-04:00" level=error msg="Post https://myloghost.example.net:8088/services/collector/event/1.0: read tcp xxx.xxx.x.xxx:49010->xxx.xxx.x.xx:8088: read: connection timed out"

It seems to me like the logging driver get stuck trying to do some I/O operation, and when it finally times out, it tries again and the logs are sent. However, I have no idea what the condition that causes it to get stuck is, nor do I know of any way to adjust the time out period.

I'd like to know why the logs take so long to get to Splunk sometimes, and if there is anything I can do to avoid the delays.

1

1 Answers

1
votes

Highly possible that this is a bug in Splunk Logging Driver, it does not set Timeout on http.Client https://github.com/moby/moby/blob/master/daemon/logger/splunk/splunk.go#L224, see https://golang.org/pkg/net/http/#Client

You can patch it.

As an alternative, I can suggest looking on our solution for monitoring docker and forwarding logs https://www.outcoldsolutions.com/