I use Nagios to watch over all of my server/systems, this is an excerpt from the fontpage:
Nagios monitors your entire IT infrastructure to ensure systems, applications, services, and business processes are functioning properly. In the event of a failure, Nagios can alert technical staff of the problem, allowing them to begin remediation processes before outages affect business processes, end-users, or customers. With Nagios you’ll never be left having to explain why an unseen infrastructure outage hurt your organization’s bottom line.
Nagios has a number of sub systems (core, plugins, config etc) to do this, but also uses RRD to store the (performance) data. RRD stands for Round Robin Database, created by Tobias Oetker.
The tool you use to deal with the data of RRD is called “rrdtool” (originally created for MRTG – another tool by Tobias), what is “rrdtool” (an excerpt from the rrdtool site):
RRDtool refers to Round Robin Database tool. Round robin is a technique that works with a fixed amount of data, and a pointer to the current element. Think of a circle with some dots plotted on the edge. These dots are the places where data can be stored. Draw an arrow from the center of the circle to one of the dots; this is the pointer. When the current data is read or written, the pointer moves to the next element. As we are on a circle there is neither a beginning nor an end, you can go on and on and on. After a while, all the available places will be used and the process automatically reuses old locations. This way, the dataset will not grow in size and therefore requires no maintenance. RRDtool works with Round Robin Databases (RRDs). It stores and retrieves data from them.
When you reboot machines/systems/routers/monitored and the data has to settle in to normal/standards it is possible you end up with spikes in the database. Following is one way to remove them:
- First make a copy of the existing file and work with the copy:
cp source.rrd workingcopy.rrd - Get information from the file, e.g. the dataset (ds) if you do not know it:
rrdtool info workingcopy.rrd | lessfilename = "workingcopy.rrd" rrd_version = "0003" step = 300 last_update = 1446430024 ds[data].type = "GAUGE" ds[data].minimal_heartbeat = 600 ds[data].min = NaN ds[data].max = 5.0000000000e+02 ds[data].last_ds = "27" ds[data].value = 3.3480000000e+03 ds[data].unknown_sec = 0 rra[0].cf = "AVERAGE" rra[0].rows = 600 rra[0].cur_row = 98 rra[0].pdp_per_row = 1 rra[0].xff = 5.0000000000e-01 rra[0].cdp_prep[0].value = NaN rra[0].cdp_prep[0].unknown_datapoints = 0
As you can see the name of the dataset for this filename is “data”.
- Set the maximum of the dataset (ds) to the required value of your choice (below I used 300), this is basically the value the next command is using to chop of the values in the dataset that exceeding that value, setting it to the maximum value you specified.
rrdtool tune workingcopy.rrd -a data:300 - Export all data in the rrd to a xml file (using aaaa.xml here to have the file end up at the start of ls -al):
rrdtool dump workingcopy.rrd > aaaa.xml - Remove the orginal file
rm -f workingcopy.rrd
- Import the data back into the working copy file
rrdtool restore aaaa.xml workingcopy.rrd -r - You could now dump the file again to see whether you are happy with the dataset:
rrdtool dump workingcopy.rrd | less
The final step would be to overwrite the original file.
That is all.