Cacti setup

cacti is a nice grapher. It can serve you several needs, but I find it useful to collect the information about the system to find out the reason of some problem (hardware failure, network response times, etc) that are used to occur from time to time. Cacti does not collect all the information itself: it delegates the collection to query (poll) scripts. There output is stored in RRD database, which is optimized for storing periodic sequential data.

System statistics

Related graphs and data polls are already packaged with Cacti. One can collect information about the average system load (as uptime reports), the memory usage, disk utilization, number of logged-in users, etc.
Additionally one can write simple poll script to collect the information from lmsensors (e.g. CPU temperature, fan RPM).
There is a nice possibility also to query S.M.A.R.T values of your HDD via smartctl. Poll script examples are here and here.

Network statistics

You might wish to collect the Internet-related statistics, in particular how much data has been send/received. You can only do it if all the traffic goes via your server. Potentially you can delegate a role of DHCP server from ADSL modem to your server and disable it on the modem. Then you can create dummy rules for iptables and collect statistics from iptables output – the script will be pretty simple. Of course one can define several rules (per destination port / per protocol / etc) and collect statistics separately. This is a bit tricky, as adds more point of failures to your local network. Usually home users want to connect to WiFi router and use internet directly.
Another alternative will be to setup a Squid proxy server and ask local users to use Internet via proxy. You can setup a transparent proxy, but that again needs server to be a default gateway in the network. Collecting of information from Squid is simple via SNMP protocol. It is switched off by default, so the only thing needed is to enable the following options in Squid configuration files:
```
acl snmppublic snmp_community public
snmp_port 3401
snmp_access allow snmppublic localhost
snmp_access deny all
```
Cacti already has SNMP poll scripts (that use net-snmp toolset). You'd better start with graph templates from here as one need to know SNMP OID for each value to perform the query.
You can also collect the information about opened connections for the server from netstat output. This is pretty straightforward to write a poll script.
You can also collect the traffic information from interface itself using ifconfig output. Poll script is included.

Other information

To re-order items in graph template do the following:

Export them into SQL script, for example, using the following query:

cat <<'EOF' | mysql cacti_db > cacti_db.sql
SELECT item.id, item.text_format, INPUT.name, item.sequence
FROM graph_templates tmpl, graph_templates_item AS item
LEFT OUTER JOIN graph_template_input_defs AS defs ON item.id = graph_template_item_id
LEFT OUTER JOIN graph_template_input AS INPUT ON INPUT.id = graph_template_input_id
WHERE item.graph_template_id = tmpl.id AND tmpl.name = 'Drive S.M.A.R.T.'
ORDER BY item.sequence, item.id;
EOF

Re-order the lines the way you like.

Run the following script:

cat cacti_db.sql | perl -ne 'next if $. == 1; @v = split /\t/; print "update graph_templates_item set sequence = " . (int(($.-2)/2)+1) . " where id = " . $v[0] . ";\n"' | mysql cacti_db

Select all graph items which are mapped twice to a data source plus all graph items which are not mapped to any data source:

mysql> SELECT graph.id AS id, graph.name AS graph_name, entry.item_id, entry.item_label, entry.input_name
FROM (
    SELECT MIN(item.graph_template_id) AS graph_id, item.id AS item_id, text_format AS item_label, group_concat(INPUT.name ORDER BY INPUT.name) AS input_name
    FROM graph_templates_item AS item, graph_template_input_defs, graph_template_input AS INPUT
    WHERE local_graph_template_item_id = 0 AND graph_type_id <> 1 AND item.id = graph_template_item_id AND INPUT.id = graph_template_input_id
    GROUP BY item.id, text_format HAVING COUNT(*) > 1
    UNION
    SELECT item.graph_template_id AS graph_id, item.id AS item_id, text_format AS item_label, '<not mapped>'
    FROM graph_templates_item AS item LEFT OUTER JOIN graph_template_input_defs ON item.id = graph_template_item_id
    WHERE local_graph_template_item_id = 0 AND graph_type_id <> 1 AND graph_template_input_id IS NULL
) entry, graph_templates AS graph
WHERE id = entry.graph_id
ORDER BY 1;
 
+------+------------------+---------+------------+-----------------------------------------------------------+
| id   | graph_name       | item_id | item_label | input_name                                                |
+------+------------------+---------+------------+-----------------------------------------------------------+
|   47 | Drive I/O        |     935 | Written    | DATA SOURCE [sd1_kb_written],DATA SOURCE [ud2_kb_written] |
|   47 | Drive I/O        |     937 | LAST:      | DATA SOURCE [sd1_kb_written],DATA SOURCE [ud2_kb_written] |
|   47 | Drive I/O        |     939 | MIN:       | DATA SOURCE [sd1_kb_written],DATA SOURCE [ud2_kb_written] |
|   47 | Drive I/O        |     941 | MAX:       | DATA SOURCE [sd1_kb_written],DATA SOURCE [ud2_kb_written] |
|   47 | Drive I/O        |     943 | Avg:       | DATA SOURCE [sd1_kb_written],DATA SOURCE [ud2_kb_written] |
|   47 | Drive I/O        |    1152 | LAST:      | DATA SOURCE [sd1_kb_written],DATA SOURCE [ud3_kb_written] |
|   47 | Drive I/O        |    1154 | MIN:       | DATA SOURCE [sd1_kb_written],DATA SOURCE [ud3_kb_written] |
|   47 | Drive I/O        |    1156 | MAX:       | DATA SOURCE [sd1_kb_written],DATA SOURCE [ud3_kb_written] |
|   47 | Drive I/O        |    1158 | Avg:       | DATA SOURCE [sd1_kb_written],DATA SOURCE [ud3_kb_written] |
|   45 | Drive S.M.A.R.T. |     775 | LAST:      | <NOT mapped>                                              |
+------+------------------+---------+------------+-----------------------------------------------------------+
9 ROWS IN SET (0.02 sec)

Select all graph items which are incorrectly mapped (

check that selected source corresponds to item source) plus graph items which are not mapped (

check that source is selected in Graph Management) plus data source which are mapped twice (

check checkbox mapping for sources in Graph Templates and/or save source mappings in Graph Management):

SELECT graph.id AS id, graph.name AS graph_name, entry.item_id, entry.item_label, entry.source, entry.mapping, entry.target
FROM (
    SELECT MIN(item1.graph_template_id) AS graph_id, group_concat(item1.id ORDER BY item1.sequence) AS item_id, group_concat(item1.text_format ORDER BY item1.sequence) AS item_label, rrd1.data_source_name AS SOURCE, 'S<=>S' AS mapping, rrd2.data_source_name AS target
    FROM graph_templates_item item1, graph_templates_item item2, data_template_rrd rrd1, data_template_rrd rrd2
    WHERE item1.id = item2.local_graph_template_item_id AND rrd1.id = item1.task_item_id AND rrd2.id = item2.task_item_id
    AND rrd1.data_source_name <> rrd2.data_source_name AND item2.task_item_id <> 0
    GROUP BY rrd1.data_source_name, rrd2.data_source_name
    UNION
    SELECT MIN(item1.graph_template_id) AS graph_id, group_concat(item1.id ORDER BY item1.sequence) AS item_id, group_concat(item1.text_format ORDER BY item1.sequence) AS item_label, rrd.data_source_name AS SOURCE, 'S<=>?', '<not mapped>'
    FROM graph_templates_item item1, graph_templates_item item2, data_template_rrd rrd
    WHERE item1.id = item2.local_graph_template_item_id AND rrd.id = item1.task_item_id AND item1.graph_type_id <> 2
    AND ((item1.task_item_id <> item2.task_item_id AND item2.task_item_id = 0) OR (item1.task_item_id = item2.task_item_id))
    GROUP BY rrd.data_source_name
    UNION
    SELECT item1.graph_template_id AS graph_id, group_concat(item1.id ORDER BY item1.sequence) AS item_id, group_concat(item1.text_format ORDER BY item1.sequence), group_concat(DISTINCT rrd.data_source_name) AS SOURCE, 'S<=>I' AS mapping, group_concat(DISTINCT INPUT.name ORDER BY INPUT.name) AS target
    FROM graph_templates_item item1, graph_templates_item item2, data_template_rrd rrd, graph_template_input_defs, graph_template_input AS INPUT
    WHERE item1.id = item2.local_graph_template_item_id AND rrd.id = item2.task_item_id AND item1.task_item_id <> item2.task_item_id AND item1.task_item_id <> 0
    AND item1.id = graph_template_item_id AND INPUT.id = graph_template_input_id
    GROUP BY rrd.id, item1.graph_template_id
    HAVING COUNT(DISTINCT graph_template_input_id) > 1
) entry, graph_templates AS graph
WHERE id = entry.graph_id
ORDER BY 1;
 
+----+------------------------+-----------------+-----------------------------------------------------+----------+---------+------------------------------------------+
| id | graph_name             | item_id         | item_label                                          | SOURCE   | mapping | target                                   |
+----+------------------------+-----------------+-----------------------------------------------------+----------+---------+------------------------------------------+
|  8 | Unix - Processes       | 65,65,66,67,68  | Running Processes,Running Processes,LAST:,Avg:,MAX: | proc     | S<=>I   | Legend Color,Processes DATA SOURCE       |
| 10 | Unix - Logged IN Users | 76,76,77,78,79  | Users,Users,LAST:,Avg:,MAX:                         | users    | S<=>I   | Legend Color,Logged IN Users DATA SOURCE |
| 41 | Unix - TCP Connections | 605,589,591,593 | Closed,LAST:,Avg:,MAX:                              | tcp_conn | S<=>S   | tcp_closed_conn                          |
| 41 | Unix - TCP Connections | 595,607,609,611 | OPEN,LAST:,Avg:,MAX:                                | tcp_conn | S<=>S   | tcp_opened_conn                          |
+----+------------------------+-----------------+-----------------------------------------------------+----------+---------+------------------------------------------+
4 ROWS IN SET (0.14 sec)

How data is stored in RRD

You have a device, which is queried repeatedly. The timespan between queries is called step.

The object in Cacti that represents the queried values from the device is a data source. A data source may query one or more values, for example upstream and downstream volume of a network device.

The value(s) that are queried with one query is called a data point.

Each time a data point is generated (by polling, every step seconds), it is written to a sub-database within a round-robin database (rrd), called round-robin archive (rra). A rra has a fixed number of data points it can contain. If it is full, the oldest data point is overwritten with the newest, thus “round-robin”.

The data retention time of the rra that contains the data points directly read from the device is step × number of data points. For example, you want a resolution of 1 minute, thus polling once every minute, and retain this resolution for 2 days of data. How many data points do need for this? 2 days are 2 × 24 × 3600 = 172800 seconds. 1 minute polling is once every 60 seconds, thus you need 172800 / 60 = 2880 data points. The data points in this rra are called primary data points, because they contain the original unaggregated data.

Now, the point of a rrd is to store not only the primary data points, but also for older data aggregated data points that take less disk space and less computing work. You can create additional rra's within a rrd with different data retention and aggregation.

In addition to the above, you might want to keep older data with 15-minute resolution for 2 weeks. 15 minute resolution means 15 primary data points aggregated into 1. Then, 2 weeks are 2 × 7 × 24 × 3600 = 1209600 seconds. 15 primary data points into one aggregated data point is 15 × 60 = 900 seconds. So you need 1209600 / 900 = 1344 data points in the second rra.

Even older data you want to store for 2 months with a resolution of 1 hour in a 3^rd rra. So: 2 months are 2 × 31 × 24 × 3600 = 5356800 seconds. 1 Hour are 60 primary data points or 3600 seconds. You need 5356800 / 3600 = 1488 data points.

The last aggregation should be 2 years with a resolution of 4 hours. So: 2 × 365 × 24 × 3600 = 63072000 seconds. 4 hours are 4 × 3600 = 14400 seconds. So you need 63072000 / 14400 = 4380 data points.

Now you have the values you need to enter in the data source profile specification. I described how to develop the default 1 minute polling profile – you find the values in the actual profile, they are only a bit rounded up in cacti.

In addition, you are able to define a “default timespan” in the cacti configuration. This is not something that is stored in the rra or shape the data in the rra. Instead, it is a hint to cacti's graphing what timespan it should choose for a graph, whenever you open a graph that's based on a data source that has a polling from this profile. It is best seen if you click on the 3^rd icon next to a graph that is called “Time graph view” on mouseover. 4 graphs open, one for each rra. The default timespans you see here are the default timespan values from the profile definition.

How to graph RRD in Grafana?

Download binary grafana-rrd-server and setup the service:

$ cat /etc/systemd/system/grafana-rrd-server.service <<'EOF'
[Unit]
Description=Grafana RRD Server
After=network.service

[Service]
User=grafana
Group=grafana
Restart=on-failure
ExecStart=/usr/local/bin/grafana-rrd-server -i 127.0.0.1 -p 9000 -r /var/lib/cacti/rra -s 300
RestartSec=30s

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload && systemctl enable grafana-rrd-server && systemctl start grafana-rrd-server
Make sure that RRD files are readable by above user: chgrp grafana -R /var/lib/cacti/rra
Install JSON API Grafana Datasource plugin: grafana-cli plugins install simpod-json-datasource

Define new datasource:

# cat /etc/grafana/provisioning/datasources/2_json.yaml <<'EOF'
apiVersion: 1

deleteDatasources:
  - name: JSON

datasources:
  - name: JSON
    type: simpod-json-datasource
    url: http://localhost:9000
EOF

Restart Grafana: systemctl restart grafana-server.service.
Create new panel with “Time series” query. Metric should be automatically loaded by plugin – if that does not happen, check RRD file permissions.

Table of Contents