2
votes

Let me preface this, by stating that I am not a DevOp, so my experience with Linux administration is limited.

I basically followed this How-To (https://cloud.google.com/monitoring/agent/install-agent) and installed the agent on my Google Compute Instance.

Everything works, I get the new metrics in my stackdriver account, however I get this flooded in my syslog

instance-name collectd[26092]: uc_update: Value too old: name = <RandomNumber>/processes-all/ps_vm; value time = 1517218302.393; last cache update = 1517218302.393;

So I found this in my /opt/stackdriver/collectd/etc/collectd.conf file

Hostname "RandomNumber"
Interval 60

This makes sense, we dont use collectd for anything else, beside stackdriver. So finding that the proccessid that causes the problem is the same as stackdriver hostname is in order.

Next I checked https://collectd.org/faq.shtml

I run this command for both /etc/collectd.conf and /opt/stackdriver/collectd/etc/collectd.conf

grep -i LoadPlugin /etc/collectd.conf | egrep -v '^[[:space:]]*#' | sort | uniq -c
  1 LoadPlugin cpu
  1 LoadPlugin interface
  1 LoadPlugin load
  1 LoadPlugin memory
  1 LoadPlugin network
  1 LoadPlugin syslog
grep -i LoadPlugin /opt/stackdriver/collectd/etc/collectd.conf | egrep -v '^[[:space:]]*#' | sort | uniq -c
  1 LoadPlugin "match_regex"
  1 LoadPlugin aggregation
  1 LoadPlugin cpu
  1 LoadPlugin df
  1 LoadPlugin disk
  1 LoadPlugin exec
  1 LoadPlugin interface
  1 LoadPlugin load
  1 LoadPlugin match_regex
  1 LoadPlugin match_throttle_metadata_keys
  1 LoadPlugin memory
  1 LoadPlugin processes
  1 LoadPlugin stackdriver_agent
  1 LoadPlugin swap
  1 LoadPlugin syslog
  1 LoadPlugin tcpconns
  1 LoadPlugin write_gcm

As you can see there is no repeating values.

I have run out of ideas, can someone help?

Thank you.

P.S. We are using Debian Stretch and running lighttpd with php.

P.S. More information This is a more detailed log with the error in it you can see the timestamps

Jan 30 10:47:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/ps_cputime; value time = 1517309269.877; last cache update = 1517309269.877;
Jan 30 10:48:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/ps_cputime; value time = 1517309329.884; last cache update = 1517309329.884;
Jan 30 10:50:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/ps_rss; value time = 1517309449.881; last cache update = 1517309449.881;
Jan 30 10:50:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/io_octets; value time = 1517309449.881; last cache update = 1517309449.884;
Jan 30 10:52:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/ps_vm; value time = 1517309569.889; last cache update = 1517309569.889;
Jan 30 10:52:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/disk_octets; value time = 1517309569.890; last cache update = 1517309569.890;
Jan 30 10:52:49 instance-name collectd[28953]: uc_update: Value too old: name = 5281367784029328076/processes-all/disk_octets; value time = 1517309569.890; last cache update = 1517309569.894;

This is the output of the PS command

 ps -e
 PID TTY          TIME CMD
 1 ?        00:01:28 systemd
 2 ?        00:00:00 kthreadd
 3 ?        00:00:24 ksoftirqd/0
 5 ?        00:00:00 kworker/0:0H
 7 ?        00:41:17 rcu_sched
 8 ?        00:00:00 rcu_bh
 9 ?        00:00:02 migration/0
10 ?        00:00:00 lru-add-drain
11 ?        00:00:03 watchdog/0
12 ?        00:00:00 cpuhp/0
13 ?        00:00:00 cpuhp/1
14 ?        00:00:03 watchdog/1
15 ?        00:00:01 migration/1
16 ?        00:11:58 ksoftirqd/1
18 ?        00:00:00 kworker/1:0H
19 ?        00:00:00 cpuhp/2
20 ?        00:00:03 watchdog/2
21 ?        00:00:01 migration/2
22 ?        00:03:16 ksoftirqd/2
24 ?        00:00:00 kworker/2:0H
25 ?        00:00:00 cpuhp/3
26 ?        00:00:03 watchdog/3
27 ?        00:00:02 migration/3
28 ?        00:03:11 ksoftirqd/3
30 ?        00:00:00 kworker/3:0H
31 ?        00:00:00 kdevtmpfs
32 ?        00:00:00 netns
33 ?        00:00:00 khungtaskd
34 ?        00:00:00 oom_reaper
35 ?        00:00:00 writeback
36 ?        00:00:00 kcompactd0
38 ?        00:00:00 ksmd
39 ?        00:01:02 khugepaged
40 ?        00:00:00 crypto
41 ?        00:00:00 kintegrityd
42 ?        00:00:00 bioset
43 ?        00:00:00 kblockd
44 ?        00:00:00 devfreq_wq
45 ?        00:00:00 watchdogd
49 ?        00:01:16 kswapd0
50 ?        00:00:00 vmstat
62 ?        00:00:00 kthrotld
63 ?        00:00:00 ipv6_addrconf
130 ?        00:00:00 scsi_eh_0
131 ?        00:00:00 scsi_tmf_0
133 ?        00:00:00 bioset
416 ?        07:01:34 jbd2/sda1-8
417 ?        00:00:00 ext4-rsv-conver
443 ?        00:02:37 systemd-journal
447 ?        00:00:00 kauditd
452 ?        00:00:01 kworker/0:1H
470 ?        00:00:01 systemd-udevd
483 ?        00:00:26 cron
485 ?        00:00:37 rsyslogd
491 ?        00:00:00 acpid
496 ?        00:00:49 irqbalance
497 ?        00:00:21 systemd-logind
498 ?        00:00:36 dbus-daemon
524 ?        00:00:00 edac-poller
612 ?        00:00:02 kworker/2:1H
613 ?        00:00:00 dhclient
674 ?        00:00:00 vsftpd
676 ttyS0    00:00:00 agetty
678 tty1     00:00:00 agetty
687 ?        00:01:18 ntpd
795 ?        4-19:58:17 mysqld
850 ?        00:00:15 sshd
858 ?        00:04:06 google_accounts
859 ?        00:00:33 google_clock_sk
861 ?        00:01:05 google_ip_forwa
892 ?        01:31:57 kworker/1:1H
1154 ?        00:00:00 exim4
1160 ?        00:00:01 kworker/3:1H
4259 ?        00:00:00 kworker/2:1
6090 ?        00:00:00 kworker/0:1
6956 ?        00:00:00 sshd
6962 ?        00:00:00 sshd
6963 pts/0    00:00:00 bash
6968 pts/0    00:00:00 su
6969 pts/0    00:00:00 bash
6972 ?        00:00:00 kworker/u8:2
7127 ?        00:00:00 kworker/3:2
7208 ?        00:00:00 php-fpm7.0
7212 ?        00:00:00 kworker/0:0 
10516 ?        00:00:00 systemd
10517 ?        00:00:00 (sd-pam)
10633 ?        00:00:00 kworker/2:2
11569 ?        00:00:00 kworker/3:1
12539 ?        00:00:00 kworker/1:2
13625 ?        00:00:00 kworker/1:0
13910 ?        00:00:00 sshd
13912 ?        00:00:00 systemd
13913 ?        00:00:00 (sd-pam)
13920 ?        00:00:00 sshd
13921 ?        00:00:00 sftp-server
13924 ?        00:00:00 sftp-server
14016 pts/0    00:00:00 tail
14053 ?        00:00:03 php-fpm7.0
14084 ?        00:00:00 sshd
14090 ?        00:00:00 sshd
14091 pts/1    00:00:00 bash
14098 ?        00:00:01 php-fpm7.0
14099 pts/1    00:00:00 su
14100 pts/1    00:00:00 bash
14105 ?        00:00:00 sshd
14106 ?        00:00:00 sshd
14107 ?        00:00:00 php-fpm7.0
14108 pts/1    00:00:00 ps
17456 ?        00:00:03 kworker/u8:1
17704 ?        01:38:36 lighttpd
21624 ?        00:00:30 perl
25593 ?        00:00:00 sshd
25595 ?        00:00:00 systemd
25596 ?        00:00:00 (sd-pam)
25602 ?        00:00:00 sshd
25603 ?        00:00:00 sftp-server
25641 ?        00:00:00 sftp-server
27001 ?        00:00:00 gpg-agent
28953 ?        00:01:20 stackdriver-col

PS grep comamnd with less, output

root@instance-7:/home/# ps aux | grep collectd
root      6981  0.0  0.0  12756   976 pts/0    S+   13:40   0:00 grep collectd
root     28953  0.1  1.1 1105712 41960 ?       Ssl  Jan29   3:16 /opt/stackdriver/collectd/sbin/stackdriver-collectd -C /opt/stackdriver/collectd/etc/collectd.conf -P /var/run/stackdriver-agent.pid
3
These should be normal messages from the Stackdriver agent. What is the rate of the messages? - Peycho Dimitrov
peycho the rate of the msgs are about 1-2-3 msgs per every minute. - thefolenangel

3 Answers

1
votes

These should be normal messages from the Stackdriver agent. (If the rate is as you said 2-3 messages per minute.)

I suggest you to install ntp/ntpd service and sync it to any time server, so you can have the right time on your system.

example ntp server: pool.ntp.org

0
votes

You are just getting a duplicate as your message has identical timestamp values for both, the new value to be added to the internal cache and the last value with the same name that was added to the cache. value time = 1517218302.393 last cache update = 1517218302.393

You can refer to the collectd faq page (https://collectd.org/faq.shtml). It explains this kind of messages including an example which matches the one you got.

You should check: - If there are more than one collectd daemon running in your instance (ps command). To see the collectd processes you can run:

ps aux | grep collectd
  • Are the timestamps increasing with each message? If this is the case, it could be that there is another host report data using the same host name.
0
votes

Since those logs seem to not been affecting the instance, if they're flooding your Stackdriver, you can exclude those logs from the default sink.

Using gcloud, this can be accomplished with the following command:

gcloud logging sinks update _Default --log-filter "$(echo $(gcloud logging sinks describe _Default --format "value(filter)") "AND NOT textPayload:\"uc_update: Value too old:\"")"