Percentiles

The percentile represents a higher boundary for specified percengage of a measurements. For example, 95th percentile = 500ms means that 95% of all samples are not slower than 500ms. This metric is not very useful in microbenchmarks, as the values from consequent runs have a very narrow distribution. However, real-world scenarios often have so-called long tail distribution (due to IO delays, locks, memory access latency and so on), so the average execution time cannot be trusted.

The percentiles allow to include the tail of distribution into the comparison. However, it requires some preparations steps. At first, you should have enough runs to count percentiles from. The TargetCount in the config should be set to 10-20 runs at least.

Second, the count of iterations for each run should not be very high, or the peak timings will be averaged. The IterationTime = 25 works fine for most cases; for long-running benchmarks the Mode = Mode.SingleRun will be the best choice. However, feel free to experiment with the config values.

Third, if you want to be sure that measurements are repeatable, set the LaunchCount to 3 or higher.

And last, don't forget to include the columns into the config. They are not included by default (as said above, these are not too useful for most of the benchmarks). There're predefined StatisticColumn.P0..StatisticColumn.P100 for absolute timing percentiles and BaselineDiffColumn.Scaled50..BaselineDiffColumn.Scaled95 for relative percentiles.

Example

Run the IntroPercentiles sample. It contains three benchmark methods.

First delays for 20 ms constantly.
The second has random delays for 10..30 ms.
And the third delays for 10ms 85 times of 100 and delays for 40ms 15 times of 100.

Here's the output from the benchmark (some columns removed for brevity):

Method	Median	StdDev	Scaled	P0	P50	P80	P85	P95	P100	ScaledP50	ScaledP85	ScaledP95
ConstantDelays	20.3813 ms	0.2051 ms	1.00	20.0272 ms	20.3813 ms	20.4895 ms	20.4954 ms	20.5869 ms	21.1471 ms	1.00	1.00	1.00
RandomDelays	19.8055 ms	5.7556 ms	0.97	10.0793 ms	19.8055 ms	25.4173 ms	26.5187 ms	29.0313 ms	29.4550 ms	0.97	1.29	1.41
RareDelays	10.3385 ms	11.4828 ms	0.51	10.0157 ms	10.3385 ms	10.5211 ms	40.0560 ms	40.3992 ms	40.4674 ms	0.51	1.95	1.96

Note that the 'Scaled' column kinda lies to you. The "almost same" RandomDelays method is actually not so performant and the seems-to-be-fastest RareDelays method is 2 times slower 15 times of 100.

Also, it's very easy to screw the results with incorrect setup. For example, the same code being run with

new Job
{
    TargetCount = 5,
    IterationTime = 500
}

completely hides the peak values:

Method	Median	StdDev	Scaled	P0	P50	P80	P85	P95	P100	ScaledP50	ScaledP85	ScaledP95
ConstantDelays	20.2692 ms	0.0308 ms	1.00	20.1986 ms	20.2692 ms	20.2843 ms	20.2968 ms	20.3097 ms	20.3122 ms	1.00	1.00	1.00
RandomDelays	18.9965 ms	0.8601 ms	0.94	18.1339 ms	18.9965 ms	19.8126 ms	19.8278 ms	20.4485 ms	20.9466 ms	0.94	0.98	1.01
RareDelays	14.0912 ms	2.8619 ms	0.70	10.2606 ms	14.0912 ms	15.7653 ms	17.3862 ms	18.6728 ms	18.6940 ms	0.70	0.86	0.92