1
votes

My original carbon storage-schema config was set to 10s:1w, 60s:1y and was working fine for months. I've recently updated it to 1s:7d, 10s:30d, 60s,1y. I've resized all my whisper files to reflect the new retention schema using the following bit of bash:

collectd_dir="/opt/graphite/storage/whisper/collectd/"
retention="1s:7d 1m:30d 15m:1y"

find $collectd_dir -type f -name '*.wsp' | parallel whisper-resize.py \
--nobackup {} $retention \;

I've confirmed that they've been updated using whisper-info.py with the correct retention and data points. I've also confirmed that the storage-schema is valid using a storage-schema validation script.

The carbon-cache{1..8}, carbon-relay, carbon-aggregator, and collectd services have been stopped before the whisper resizing, then started once the resizing was complete.

However, when checking in on a Grafana dashboard, I'm seeing empty graphs with correct data points (per sec, but no data) on collectd plugin charts; but with the graphs that are providing data, it's showing data and data points every 10s (old retention), instead of 1s.

The /var/log/carbon/console.log is looking good, and the collectd whisper files all have carbon user access, so no permission denied issues when writing.

When running an ngrep on port 2003 on the graphite host, I'm seeing connections to the relay, along with metrics being sent. Those metrics are then getting relayed to a pool of 8 caches to their pickle port.

Has anyone else experienced similar issues, or can possibly help me diagnose the issue further? Have I missed something here?

1

1 Answers

0
votes

So it took me a little while to figure this out. It had nothing to do with the local_settings.py file like some of the old responses, but it had to do with the Interval function in the collectd.conf.

A lot of the older responses mentioned that you needed to include 'Interval 1' inside each Plugin container. I think this would have been great due to the control of each metric. However, that would create config errors in my logs, and break the metric. Setting 'Interval 1' at top level of the config resolved my issues.