Inspecting aspects and interception in .NET, part 2

Last time we evaluated a number of ways to add instrumentation to your code while keeping your code maintainable and legible. We landed on using interception and the cliffhanger question from last time was: what’s the overhead? Is it fast enough? In this post, we will start looking at ways to answer this question.

Benchmarking is hard

It’s not easy to measure how fast your code runs benchmark your code. There are misconceptions about how to measure how long something takes and what that means. You might think to capture the current date and time, run something in a loop, and again capture the current date and time. Take the difference, divide it by the number of iterations, and you have a relevant number. And you’d be wrong.

Don’t measure using DateTime

Naively measuring your code’s run-time duration looks like this:

DateTime startTime = DateTime.Now;  
LongRunningOperation();  
DateTime endTime = DateTime.Now;  
TimeSpan elapsed = endTime - startTime;  

This is wrong for a number of reasons.

  • DateTime does not have enough accuracy to measure small increments;
  • Assuming you’re doing this in production code, you might have gone into or out of daylight savings time, giving you huge or negative readings;
  • Even if you’re using DateTime.UtcNow, the system time might have been adjusted—perhaps by an automatic time synchronization.

There are probably more of these falsehoods programmers believe about time that apply to measuring how long your code took to execute.

They can all be easily fixed by using something designed for the task: System.Diagnostics.Stopwatch. It is a highly accurate, monotonically increasing counter which does not suffer from changes to date or time at all. But even Stopwatch does not have an infinite amount of accuracy.

Don’t run your code just once

Let’s try to measure the run-time of a particularly small piece of code:

int x = 1;  
var sw = Stopwatch.StartNew();  
x++;  
sw.Stop();  

On my machine (YMMV), for about half the times, the elapsed time is zero, while the rest of the times, I get 300 nanoseconds. Neither of those sound plausible.

Let’s run it a whole lot of times and extrapolate a mean running time.

int x = 1;  
var sw = Stopwatch.StartNew();

for (int i = 0; i < 100000; i++)  
{
    x++;
}

sw.Stop();  

Now I’m getting a total run-time of around 67 000 nanoseconds. But there seems to be a lot of jitter in the results. Here are the results of ten consecutive runs.

74 666,61 ns
51 357,98 ns
36 740,71 ns
79 802,41 ns
76 641,91 ns
41 086,39 ns
82 962,90 ns
117 728,30 ns
41 481,45 ns
40 296,26 ns

My number of iterations is probably still too small. I’d need to do some trial runs to figure out how many iterations I need to reduce the effects of jitter.

Unforeseen consequences

There are a lot of issues that crop up when you try to benchmark your code.

  • The JIT compiler might add a delay to the first few calls to your code;
  • The garbage collector might randomly pause your code to clean up memory;
  • The benchmarking code itself also takes time to execute; you don’t want that included in the test results;
  • You might be interested in more statistics than the mean execution time (think median, percentiles, standard deviation...);
  • Compilers are pretty good at finding dead code, so you need to make sure your code is not being optimized away;
  • Different runtimes might provide different results, so you probably want to see what running in x64 mode does, or when using (or not using) RyuJIT.

There are a thousand other problems you might encounter if you try to write your own benchmarking framework. Benefit from the blood, sweat and tears of other people and use an existing, battle-hardened one. My library of choice is BenchmarkDotNet.

BenchmarkDotNet

Let’s examine what a benchmark looks like when using BenchmarkDotNet. For the sake of example, we’ll be benchmarking string concatenation. I want to compare the speed of string.Concat, string.Format and using a StringBuilder. Yes, I am aware that the comparison is totally unfair. Just play along.

class StringConcatenation  
{
    [Benchmark]
    public string UsingStringConcat()
    {
        return string.Concat("Hello,", " ", "world!");
    }

    [Benchmark]
    public string UsingStringFormat()
    {
        return string.Format("{0}{1}{2}", "Hello", " ", "world!");
    }

    [Benchmark]
    public string UsingStringBuilder()
    {
        var builder = new StringBuilder();

        builder.Append("Hello,");
        builder.Append(" ");
        builder.Append("world!");

        return builder.ToString();
    }
}

Each method decorated with [Benchmark] is a candidate to be benchmarked, and when you start the process, like so:

BenchmarkRunner.Run<StringConcatenation>();  

... BenchmarkDotNet will start running benchmarks. It will go through a number of steps to find the right iteration count, measure its own overhead and warm up the environment.

After a bit of thinking it will produce something like this:

// * Summary *

Host Process Environment Information:  
BenchmarkDotNet=v0.9.8.0  
OS=Microsoft Windows NT 6.2.9200.0  
Processor=Intel(R) Core(TM) i7-6700HQ CPU 2.60GHz, ProcessorCount=8  
Frequency=2531249 ticks, Resolution=395.0619 ns, Timer=TSC  
CLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE  
GC=Concurrent Workstation  
JitModules=clrjit-v4.6.1586.0

Type=StringConcatenation  Mode=Throughput  GarbageCollection=Concurrent Workstation

             Method |      Median |    StdDev |
------------------- |------------ |---------- |
  UsingStringConcat |  27.7441 ns | 0.5363 ns |
  UsingStringFormat | 125.0597 ns | 1.6980 ns |
 UsingStringBuilder |  53.8680 ns | 1.9694 ns |

Let’s look at the table at the bottom, which, incidentally, is valid GFM.

String concatenation benchmark
Method Median StdDev
UsingStringConcat 27,7441 ns 0,5363 ns
UsingStringFormat 125,0597 ns 1,6980 ns
UsingStringBuilder 53,8680 ns 1,9694 ns

This is the final performance report. By default it lists the median duration and the standard deviation. We can see from the median duration that in this run, on my machine, using the regular x86 JIT, string.Concat is almost twice as fast as StringBuilder, and from the standard deviation that the duration is also much more stable.

I’ve heard and read rumors that StringBuilder will be faster than string.Concat if you concatenate more than 5 substrings. Let’s find out.

class StringConcatenation  
{
    [Benchmark]
    public string UsingStringConcat()
    {
        return string.Concat("Hello,", " ", "world!", " Something ", "else ", "entirely");
    }

    [Benchmark]
    public string UsingStringFormat()
    {
        return string.Format("{0}{1}{2}{3}{4}{5}", "Hello", " ", "world!", " Something ", "else ", "entirely");
    }

    [Benchmark]
    public string UsingStringBuilder()
    {
        var builder = new StringBuilder();

        builder.Append("Hello,");
        builder.Append(" ");
        builder.Append("world!");
        builder.Append(" Something ");
        builder.Append("else ");
        builder.Append("entirely");

        return builder.ToString();
    }
}
Using 6 substrings
Method Median StdDev
UsingStringConcat 125,4313 ns 2,3578 ns
UsingStringFormat 250,0233 ns 4,5334 ns
UsingStringBuilder 164,9556 ns 2,2380 ns

string.Format is still trailing behind, but the gap between string.Concat and StringBuilder has shrunk significantly. Let’s see what happens if we add more arguments.

Using 9 substrings
Method Median StdDev
UsingStringConcat 183,3770 ns 11,9913 ns
UsingStringFormat 361,9582 ns 5,2223 ns
UsingStringBuilder 198,7081 ns 4,9657 ns

Now things are getting interesting. The two methods are almost as fast, but it’s notable that string.Concat’s standard deviation is much higher than that of StringBuilder.

Using 14 substrings
Method Median StdDev
UsingStringConcat 262,7727 ns 69,2226 ns
UsingStringFormat 504,8921 ns 15,0032 ns
UsingStringBuilder 290,8394 ns 4,0046 ns

The relative gap between the two now seems to be increasing, whereas the standard deviation is only increasing for string.Concat, and decreasing for StringBuilder. Let’s go all out and double the number of arguments.

Using 28 substrings
Method Median StdDev
UsingStringConcat 500,1520 ns 31,3080 ns
UsingStringFormat 882,1520 ns 37,3244 ns
UsingStringBuilder 475,3793 ns 14,2873 ns

I ran this one a number of times to be sure, but it looks like StringBuilder is finally faster than string.Concat! The conclusion? It takes around 28 substrings to be concatenated for StringBuilder to be faster than string.Concat on my machine, in x86 mode, using .NET 4.6.2. I didn’t measure the exact tipping point, but I guesstimate it to be around 25 substrings. Interestingly enough, when using RyuJIT in x64 mode, string.Concat was still winning out.

What does this tell us?

Very little. At most, it tells us that there is little to no reason to use StringBuilder over string.Concat for scenarios where you know at compile-time which strings you will be concatenating, and you’re not using loops.

It’s important to realize that benchmarks can give you an indication of performance, but they are no replacement for a run-time metric or a profiler when it comes to finding hotspots. Also, YMMV; when you benchmark a piece of code using a certain runtime on a certain architecture, your results are only valid on that specific environment. Nowhere else.

This, however, is what micro-benchmarks are about. They tell you if algorithm A is faster than algorithm B under specific circumstances, and by how much. When the difference is small, unless you have extremely stringent performance requirements, at most you might say: this won’t hurt us.

Now that we’ve looked at micro-benchmarking, next time we’ll look at actually putting some DI frameworks through their paces, combining it with interception, and trying to explain what we find.