Percentiles
The percentile represents a higher boundary for specified percengage of a measurements. For example, 95th percentile = 500ms means that 95% of all samples are not slower than 500ms. This metric is not very useful in microbenchmarks, as the values from consequent runs have a very narrow distribution. However, real-world scenarios often have so-called long tail distribution (due to IO delays, locks, memory access latency and so on), so the average execution time cannot be trusted.
The percentiles allow to include the tail of distribution into the comparison. However, it requires some preparations steps.
At first, you should have enough runs to count percentiles from. The TargetCount
in the config should be set to 10-20 runs at least.
Second, the count of iterations for each run should not be very high, or the peak timings will be averaged.
The IterationTime = 25
works fine for most cases; for long-running benchmarks the Mode = Mode.SingleRun
will be the best choice.
However, feel free to experiment with the config values.
Third, if you want to be sure that measurements are repeatable, set the LaunchCount
to 3 or higher.
And last, don't forget to include the columns into the config. They are not included by default (as said above, these are not too useful for most of the benchmarks).
There're predefined StatisticColumn.P0
..StatisticColumn.P100
for absolute timing percentiles and BaselineDiffColumn.Scaled50
..BaselineDiffColumn.Scaled95
for relative percentiles.
Example
Run the IntroPercentiles sample. It contains three benchmark methods.
- First delays for 20 ms constantly.
- The second has random delays for 10..30 ms.
- And the third delays for 10ms 85 times of 100 and delays for 40ms 15 times of 100.
Here's the output from the benchmark (some columns removed for brevity):
Method | Median | StdDev | Scaled | P0 | P50 | P80 | P85 | P95 | P100 | ScaledP50 | ScaledP85 | ScaledP95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ConstantDelays | 20.3813 ms | 0.2051 ms | 1.00 | 20.0272 ms | 20.3813 ms | 20.4895 ms | 20.4954 ms | 20.5869 ms | 21.1471 ms | 1.00 | 1.00 | 1.00 |
RandomDelays | 19.8055 ms | 5.7556 ms | 0.97 | 10.0793 ms | 19.8055 ms | 25.4173 ms | 26.5187 ms | 29.0313 ms | 29.4550 ms | 0.97 | 1.29 | 1.41 |
RareDelays | 10.3385 ms | 11.4828 ms | 0.51 | 10.0157 ms | 10.3385 ms | 10.5211 ms | 40.0560 ms | 40.3992 ms | 40.4674 ms | 0.51 | 1.95 | 1.96 |
Note that the 'Scaled' column kinda lies to you. The "almost same" RandomDelays method is actually not so performant and the seems-to-be-fastest RareDelays method is 2 times slower 15 times of 100.
Also, it's very easy to screw the results with incorrect setup. For example, the same code being run with
new Job
{
TargetCount = 5,
IterationTime = 500
}
completely hides the peak values:
Method | Median | StdDev | Scaled | P0 | P50 | P80 | P85 | P95 | P100 | ScaledP50 | ScaledP85 | ScaledP95 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ConstantDelays | 20.2692 ms | 0.0308 ms | 1.00 | 20.1986 ms | 20.2692 ms | 20.2843 ms | 20.2968 ms | 20.3097 ms | 20.3122 ms | 1.00 | 1.00 | 1.00 |
RandomDelays | 18.9965 ms | 0.8601 ms | 0.94 | 18.1339 ms | 18.9965 ms | 19.8126 ms | 19.8278 ms | 20.4485 ms | 20.9466 ms | 0.94 | 0.98 | 1.01 |
RareDelays | 14.0912 ms | 2.8619 ms | 0.70 | 10.2606 ms | 14.0912 ms | 15.7653 ms | 17.3862 ms | 18.6728 ms | 18.6940 ms | 0.70 | 0.86 | 0.92 |