2
votes

I would like to store time series data, such as CPU usage over 6 Months (Will poll the CPU usage every 2 minutes, so later I can get several resolutions, such as - 1 Week, 1 Month, or even higher resolutions, 5 Minutes,etc).

I'm using Perl, and I dont want to use RRDtool or relational database, I was thinking of implementing my own using some sort of a circular buffer (ring buffer) with the following properties:

  1. 6 Months = 186 Days = 4,464 Hours = 267,840 Minutes.
  2. Dividing it into 2 minutes sections: 267,840 / 2 = 133,920.
  3. 133,920 is the ring-buffer size.
  4. Each element in the ring-buffer will be a hashref with the key as the epoch (converted easily into date time using localtime) and the value is the CPU usage for that given time.
  5. I will serialize this ring-buffer (using Storable I guess)

Any other suggestions? Thanks,

2
I strongly encourage you to use a database. SQLite requires no serverBorodin

2 Answers

10
votes

I suspect you're overthinking this. Why not just use a flat (e.g.) TAB-delimited file with one line per time interval, with each line containing a timestamp and the CPU usage? That way, you can just append new entries to the file as they are collected.

If you want to automatically discard data older than 6 months, you can do this by using a separate file for each day (or week or month or whatever) and deleting old files. This is more efficient than reading and rewriting the entire file every time.


Writing and parsing such files is trivial in Perl. Here's some example code, off the top of my head:

Writing:

use strict;
use warnings;
use POSIX qw'strftime';

my $dir = '/path/to/log/directory';

my $now = time;
my $date = strftime '%Y-%m-%d', gmtime $now;  # ISO 8601 datetime format
my $time = strftime '%H:%M:%S', gmtime $now;

my $data = get_cpu_usage_somehow();

my $filename = "$dir/cpu_usage_$date.log";

open FH, '>>', $filename
    or die "Failed to open $filename for append: $!\n";

print FH "${date}T${time}\t$data\n";

close FH or die "Error writing to $filename: $!\n";

Reading:

use strict;
use warnings;
use POSIX qw'strftime';

my $dir = '/path/to/log/directory';

foreach my $filename (sort glob "$dir/cpu_usage_*.log") {
    open FH, '<', $filename
        or die "Failed to open $filename for reading: $!\n";
    while (my $line = <FH>) {
        chomp $line;
        my ($timestamp, $data) = split /\t/, $line, 2;
        # do something with timestamp and data (or save for later processing)
    }
}

(Note: I can't test either of these example programs right now, so they might contain bugs or typos. Use at your own risk!)

2
votes

As @Borodin suggests, use SQLite or DBM::Deep as recommended here.

If you want to stick to Perl itself, go with DBM::Deep:

A unique flat-file database module, written in pure perl. ... Can handle millions of keys and unlimited levels without significant slow-down. Written from the ground-up in pure perl -- this is NOT a wrapper around a C-based DBM. Out-of-the-box compatibility with Unix, Mac OS X and Windows.

You mention your need for storage, which could be satisfied by a simple text file as advocated by @llmari. (And, of course, using a CSV format would allow the file to be manipulated easily in a spreadsheet.)

But, if you plan on collecting a lot of data, and you wish to eventually be able to query it with good performance, then go with a tool designed for that purpose.