Quote:
Originally Posted by danayster
I've always considered myself someone who's always got along with numbers and math, but for some reason I cannot fully comprehend what "95th percentile" actually calculates too. I've even looked it up at Wikipedia but its just still unclear.
Can someone please explain the terminology to me? In the context of how web hosts use it.
|
95th percentile is basically how your providers are billed, almost exclusively. The other option for providers is if course a full link (e.g. 10 gigabit) for X. Most providers usually will opt for the 95th percentile as it lets you cheaply have additional capacity, without paying for it sitting there unused. For example, I could have 2x10GE uplinks to level3, each pushing 5gbit and pay for the actual usage of 10Gbit/sec vs. having to have 2 10gbit links, one maxed out, and the other essentially unused while paying for the full 20gbit.
This also works for you as well, it's how a provider can provider you a gige or 100mbit line, and bill you for a fraction of it. Your host having the extra capacity for you to use above your committed data rate, does not come free in terms of internal infrastructure and transit/peering links to other providers. Thus, average billing (otherwise known as per-GB billing, the math is identical) incurs substantial risk for a host - what happens when a user maxes out their gige for a single hour during each day, but has zero usage otherwise? Via average billing they would be billed for nearly nothing, but you still had to have a full gigabit of capacity for them - obviously taking a rather substantial loss. Again, a numbers game
That explains *why* 95th percentile is used. Hopefully I can explain the math behind it concisely. My favorite way to try to describe it's intent to folks, is it is "average peak utilization" of a given link. The number was found to largely capture the actual rate used on a day to day basis, during a given customer's peak times - while allowing for extraneous bursting to not be billed (so if you hit a full gigabit for 4 hours one day, and you otherwise are at around 200mbit during your daily peaks, you will be billed for that 200mbit, not the full gige).
The math works like so. Imagine you have 30 days in a month. 10% of this figure is 3 days, so we have some nice round numbers to work with. Lets say I take an average usage rate for each day.
So, I have 30 "samples" of your average daily usage. I then look at this data, and throw out the 3 highest days of usage. The next highest sample (day) is what determines your billing rate. This lets you have 3 days of "free" bursting, and you pay for the next highest daily average after those 3.
95th percentile for billing works exactly the same way. Simply swap out the 30 samples (days) with 3600 samples (5 minute averages), and the top 10% of those with 5%. In a provider billing case, we are throwing away the top 5% of those 5 minute averages, and then billing you on the next highest 5 minute average usage sample. This equates to roughly 1.5 days of "free" usage. So, if you get slashdotted one day out of the month, you will not be billed for your quadruple usage. If you get slashdotted for 5 days of the month, you will.
It sounds somewhat complicated at first, but once you become familiar with it, it's not so bad. In fact, it's pretty amazing how accurate it really is at getting to the "average daily peak usage" number I mentioned at first.
Hope that helps
SnakeDoctor - yep, you understand it fully. Let me know if you have further questions.
Peace,
-Phil