<<<--- Back to www.seanadams.com

MRTG 95th Percentile

MRTG is an SNMP monitoring/graphing program by Tobias Oetiker and Dave Rand.
Click here to go to the official MRTG site

What is the 95th percentile, and why is it useful in measuring bandwidth?

The 95th percentile is the smallest number that is greater that 95% of the numbers in a given set. The reason this statistic is so useful in measuring data throughput is that is gives a very accurate picture of the cost of the bandwidth. Here's an example. Suppose an ISP sells you a T1 line, but you're only using it to access the web. Even though you might frequently download very large files (filling the pipe) your cost to the ISP is negligible, because your usage is intermittent. A single T3 connection to the backbone could easily support hundreds of such downstream customers, and never become saturated. As another example, suppose you are hosting a very busy web site that half-way fills your T1 for several hours every day. This type of bandwidth is more expensive, because your ISP can't oversell their connection to the backbone as effectively. The important thing to realize is that it doesn't cost your ISP anything to sell you a pipe of any particular size - it is the sustained rate of data transfer that costs them money. The sum of the 95th percentile usage of all of an ISP's customers predicts the peak amount of backbone traffic that the ISP will incur (in a given direction).

Here are some examples. ISPs must charge for bandwidth by one of three means:

  1. Sell a flat rate, possibly bandwidth limited connection, and try to sell to customers whose usage patterns are not so intese. Nearly all DSL providers do this. The customers like it because they don't have to worry about how much bandwidth they use, and ISPs like it because it simplifies billing, and they make more money as long as they have plenty of low-usage customers. The problem, particularly if the ISP is selling very fast connections, is that the ISP can become overwhelmed by even a small number of high-usage customers. Even residential customers can be such high-usage clients, thanks to recently popular services such as peer-to-peer file sharing.
  2. Sell a fast connection (eg 100Mbit Ethernet, which is inexpensive) and charge for the volume of data transfer - eg number of Gigabytes per month. This model works great for web sites, which almost always generate traffic in a predictable bell curve. However, it severely penalizes customers who use bandwidth intermittently. For example, suppose a customer runs an automated off-site backup every night. This brief usage spurt costs the ISP almost nothing. Although the recurring sustained data rate is low, the customer gets charged for a huge amount of bandwidth.
  3. Sell a fast connection and bill by 95th percentile. By now this should make sense - it's a fair system where everybody pays for what they get. The advantage to the customer is that they get the performance of a high-speed connection, while paying only for their actual usage. ISPs like it because they don't have to worry about high-usage customers upsetting their overselling ratios.

Irrespective of billing concerns, the 95th percentile is a very interesting and useful figure. Bottom line is it tells you how much of your connection you're really using (and really need).




This is a patch to add 95th percentile metering to MRTG. This is not as simple a feature as one might think. MRTG normally saves only one day worth of 5-minute samples. It is not possible to accurately calculate the 95th percentile without having all of the samples for a one month period. In order to calculate the 95th percentile for a 30-day period, it is necessary to save an entire 30 days worth of the 5-minute samples.

My first approach was to figure out the MRTG/rateup source code, and integrate my modification with MRTG. After getting into it, I realized that there was a much simpler "quick hack" that would work fine. So here it is:

95.pl is a program that you run every hour. You can run it more often if you like, but it's not necessary since the 95th percentile usually changes *very* gradually over time. The script first takes a "snapshot" of each MRTG log file, and saves them by date. Then, it goes through each of these snapshot files, which contain all of the 5-minute samples, and sorts the last 30 days worth of data. Finally, it saves the 95th percentile of these data as a file called mrtg_target.95.

Finally, a simple modification to mrtg is all that is needed to incorporate the number into the HTML page

So, here are the files you need. Probably the best place to install 95.pl is wherever you installed mrtg. If you happen to be running MRTG-2.8.6, download the modified version, below. If you're running the latest 2.9.10, download the patch. The changes to MRTG are only two or three lines to incorporate the number stored in the .95 file into the output page. If you want to find the changes, just search for "$nf", "NF", and "95".

Here's an example crontab:
# Run MRTG every five minutes:

*/5  *  *  *  *  sadams  /www/www.seanadams.com/mrtg  /www/www.seanadams.com/mrtg.conf

# Run 95.pl every hour, at one minute past the hour

1    *  *  *  *  sadams  /www/www.seanadams.com/95.pl