Keeping Ganglia graphs longer than one year

By default, the longest time period you can view in Ganglia is 1 year. In some cases I’d like to keep metrics for longer than one year, especially when they are not system metrics but application specific metrics like how many bytes and files there are in my storage cluster.

It’s not very trivial to do this in Ganglia. You’ll have to do 3 things:

  1. Change the RRA (round robin arrays) settings in gmetad.conf;
  2. Change the actual RRD files to match this
  3. Change the conf.php file to show the values in the web interface.

It’s especially the second step that proved difficult, because the rrdtool in RHEL/Centos 7 is ancient (like many other packages).

So let’s get started.

1. Change gmetad.conf

An RRD (round robin database) file consists of 1 or more RRAs (round robin arrays). The default value of of the RRAs in /etc/ganglia/gmetad.conf is this:

RRAs "RRA:AVERAGE:0.5:1:5856" \
     "RRA:AVERAGE:0.5:4:20160" \
     "RRA:AVERAGE:0.5:40:52704"

This needs some explanation. There is a value called step, which is explained elsewhere in the gmetad.conf. It’s the smallest interval at which metrics are processed. By default it’s 15 seconds, which is a good value in most cases. Don’t change it unless you have to. The 0.5 value here is not important for what we want to do.

Now, the configuration above consists of 3 arrays. The first keeps one value each step (15s), for 5856 values. That is 5856 x 15s, which is 5856 x 15/3600 = 24.4 hours. So for a bit more than one day, we’ll have 4 values per minute.

The second array keeps the average of 4 steps, which is one value per minute, for 20160 minutes, that is 14 days.

The third array keeps one value every 40 steps or 10 minutes, for 52704 x 10 / 60 / 24 = 366 days. So, for a year metrics are kept at a 10 minute interval.

Now, I want to keep 10 years of history with 2 values per day. So the interval is 12 hours, or 12 x 60 x 4 = 2880 steps. If we want to keep 2 values per day for 10 years, we need to store 2 x 366 x 10 = 7320 values. Let’s round that a bit to 7400.

So then we need to add:

 "RRA:AVERAGE:0.5:2880:7400"

All together:

RRAs "RRA:AVERAGE:0.5:1:5856" \
     "RRA:AVERAGE:0.5:4:20160" \
     "RRA:AVERAGE:0.5:40:52704" \
     "RRA:AVERAGE:0.5:2880:7400"

Time for the next step.

2. Change the RRD files

The above change does not have any effect by itself. The RRD files won’t grow automatically. You’ll have to do that by hand, with rrdtool tune. At first I thought rrdtool resize would do it, but that works only on resizing existing RRAs; it can’t add an RRA.

Now, on Centos 7 I ran into the problem that the rrdtool was quite old, and the operation I needed to do was not supported in that version. You need at least version 1.5 to add RRAs. So I built a fresh rrdtool. Here’s how to do that on Centos 7. Don’t worry, it won’t mess with the original rrdtool on your system.

yum install glib2-devel glib2 libxml2 libxml2-devel pango pango-devel gcc perl-devel
BUILD_DIR=/tmp/rrdbuild ; INSTALL_DIR=/opt/rrdtool
mkdir -p $BUILD_DIR ; cd $BUILD_DIR
wget http://oss.oetiker.ch/rrdtool/pub/rrdtool-1.7.0.tar.gz
tar -xf rrdtool-1.7.0.tar.gz 
cd rrdtool-1.7.0/
./configure --prefix=$INSTALL_DIR
make
make install
/opt/rrdtool/bin/rrdtool tune

The last command should give you usage info of the rrdtool tune command we’ll be using.

For more info, read these instructions to build rrdtool.

Now, before you continue to change all RRD files, you really should make a copy or backup. And stop gmetad, because it might interfere with what you want to do. And it’s a good idea to test it first with a single file. Just copy one file from /var/lib/ganglia/rrds/… that ends with .rrd, and call the copy test.rrd. Then:

/opt/rrdtool/bin/rrdtool info test.rrd
/opt/rrdtool/bin/rrdtool tune test.rrd RRA:AVERAGE:0.5:2880:7400
/opt/rrdtool/bin/rrdtool info test.rrd

The info commands will list the RRAs that are in the RRD file. The second time, you should see the added RRA. When you got this right, it’s time to change ALL files. But please don’t forget to make a backup first!

systemctl stop gmetad
cd /var/lib/ganglia
cp -ra rrds rrds-backup

And there we go. This could take a few minutes.

find -type f -name '*.rrd' \
     -exec /opt/rrdtool/bin/rrdtool tune {} \
           RRA:AVERAGE:0.5:2880:7400 \;

Now, because we run this as root, all files will have root as owner. This won’t work because Ganglia does not run as root. We change back ownership:

cd /var/lib/ganglia/rrds
chown -R ganglia:ganglia *

And now we can start gmetad again:

systemctl start gmetad

From this moment, Ganglia will keep your metrics for longer than 1 year. One more thing to do.

3. Change the web interface

You’ll probably want to see the metrics you’re collecting. So we’re going to add a “decade” time range to the Ganglia web interface. After the previous steps, this is peanuts.

Edit this file: /etc/ganglia/conf.php. There should be a list of time ranges there; if it’s commented out, uncomment it.

$conf['time_ranges'] = array(
  'hour'=>3600,
  '2hr'=>7200,
  '4hr'=>14400,
  'day'=>86400,
  'week'=>604800,
  'month'=>2419200,
  'year'=>31449600,
  # Needs to be an entry here to support 'r=job' in the query args to graph.php
  'job'=>0
);

Add the line:

'decade'=>314496000,

It should look like this:

$conf['time_ranges'] = array(
  'hour'=>3600,
  '2hr'=>7200,
  '4hr'=>14400,
  'day'=>86400,
  'week'=>604800,
  'month'=>2419200,
  'year'=>31449600,
  'decade'=>314496000,
  # Needs to be an entry here to support 'r=job' in the query args to graph.php
  'job'=>0
);

Refresh the Ganglia web interface in your browser. You should now have a “decade” time range.

That’s it. Metrics for a decade!

Leave a comment