Diagnostic Tools

 © Andrey Akinshin 2019

Andrey AkinshinPro .NET Benchmarkinghttps://doi.org/10.1007/978-1-4842-4941-3_6

.NET Diagnostic Tools

Andrey Akinshin1 
(1)
Saint Petersburg, Russia
 

If all you have is a hammer, everything looks like a nail.

— Abraham Maslow

Benchmarking is only one of the performance investigation steps. In this chapter, you will find a brief overview of the some important diagnostic tools that can be useful for the whole investigation. We will learn the following kinds of tools:
  • Benchmarking harness

    This tool automatically benchmarks the specified method and displays corresponding metrics. It tells you how much time it takes to perform this method, but it doesn’t always tell you why you have such values.

  • Performance profiler

    This tool measures performance metrics for each called method inside an application. It tells you where the performance bottleneck of the application is and allows exploring performance profiles with detailed information about consumed CPU resources for each method.

  • Memory profiler

    This tool measures memory traffic for an application. It tells you how many objects were allocated and allows exploring memory snapshots with detailed information about the graph of alive and dead objects of each class.

  • C#/VB decompiler

    This tool takes a .NET assembly and shows C#/VB code (even if you don’t have original source code).

  • IL decompiler

    This tool takes a .NET assembly and shows IL code for requested classes and methods.

  • ASM Decompiler

    This tool takes a .NET assembly or an existing .NET process and shows the native code for requested classes and methods.

  • Debuggers

    This tool allows debugging .NET assemblies. The debugger is especially useful when it can also show C#/IL/ASM disassembly listings and debug external code (with or without symbols).

  • System monitoring tool

    This tool monitors all processes in the operating system and shows performance, memory, and other metrics for the system in general and for individual processes and their threads.

The tools will be presented in the following groups:
  • BenchmarkDotNet

    We will discuss the only one benchmarking harness: BenchmarkDotNet . This is the most adopted library, used in many popular open source and closed source projects.

  • Visual Studio Tools

    Visual Studio is an IDE, but it has some important embedded tools that are useful for performance investigations. We will discuss the embedded memory/performance profiler and debugging tools.

  • JetBrains Tools

    JetBrains has many different tools that provide advanced support for performance/memory profiling and decompilation. We will discuss dotPeek, dotTrace, dotMemory, ReSharper, and Rider.

  • Windows Sysinternals

    This is a suite of independent tools for Windows that can simplify different steps of performance investigations and collect system metrics. We will discuss RAMMap, VMMap, and ProcessMonitor.

  • Other Useful Tools

    There are many other tools in the .NET ecosystem that can also be useful in different scenarios. We will discuss ildasm, monodis, ILSpy, dnSpy, WinDbg, PerfView, Mono Console Tools, perfcollect, Process Hacker, and Intel VTune Amplifier.

The topic of diagnostic tools is huge, and it’s not possible to cover all of them in detail in this chapter. The aim of this chapter is to provide an overview of some available tools. However, you will not find step-by-step tutorials that teach you how to use them: you will have to study them yourself. You are free to choose any tools you like: you can look for them on the Internet or build your own software. In this chapter, we are going to briefly discuss some features of some tools that can be used during performance investigations.

For each tool, you will find some useful information: the URL of the official website, links to useful resources, the license, and the supported operating systems. The “free/commercial” label means that the general license is commercial, but there are some free options (e.g., for open source projects, for students and teachers, for small teams, and so on). You can find the full information about the discounted and complimentary licenses on the official websites.

BenchmarkDotNet

BenchmarkDotNet is a powerful .NET library for benchmarking with tons of features that help to design benchmarks, execute them, and analyze performance results. I’m proud to say that I’m the project lead of this library. I started BenchmarkDotNet in 2013 as a small pet project. Today, it’s a highly adopted open source project supported by the .NET Foundation. BenchmarkDotNet is used for performance experiments in the most popular .NET projects including .NET Core. Here is a usage example:
using System;
using System.Security.Cryptography;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
namespace MyBenchmarks
{
  [ClrJob(baseline: true), CoreJob, MonoJob, CoreRtJob]
  public class Md5VsSha256
  {
    private SHA256 sha256 = SHA256.Create();
    private MD5 md5 = MD5.Create();
    private byte[] data;
    [Params(1000, 10000)]
    public int N;
    [GlobalSetup]
    public void Setup()
    {
      data = new byte[N];
      new Random(42).NextBytes(data);
    }
    [Benchmark]
    public byte[] Sha256() => sha256.ComputeHash(data);
    [Benchmark]
    public byte[] Md5() => md5.ComputeHash(data);
  }
  public class Program
  {
    public static void Main(string[] args)
    {
      var summary = BenchmarkRunner.Run<Md5VsSha256>();
    }
  }
}
This program will generate an output like this:
BenchmarkDotNet=v0.11.0, OS=Windows 10.0.16299.309 (1709/Redstone3)
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
Frequency=3507504 Hz, Resolution=285.1030 ns, Timer=TSC
.NET Core SDK=2.1.300-preview1-008174
  [Host]     : .NET Core 2.1.0-preview1-26216-03
               (CoreCLR 4.6.26216.04, CoreFX 4.6.26216.02), 64bit RyuJIT
  Job-HKEEXO : .NET Framework 4.7.1
               (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.2633.0
  Core       : .NET Core 2.1.0-preview1-26216-03
               (CoreCLR 4.6.26216.04, CoreFX 4.6.26216.02), 64bit RyuJIT
  CoreRT     : .NET CoreRT 1.0.26414.01, 64bit AOT
  Mono       : Mono 5.10.0 (Visual Studio), 64bit
| Method | Runtime |     N |       Mean |     Error |    StdDev | Ratio |
|------- |-------- |------ |-----------:|----------:|----------:|------:|
| Sha256 |     Clr |  1000 |   8.009 us | 0.0370 us | 0.0346 us |  1.00 |
| Sha256 |    Core |  1000 |   4.447 us | 0.0117 us | 0.0110 us |  0.56 |
| Sha256 |  CoreRT |  1000 |   4.321 us | 0.0139 us | 0.0130 us |  0.54 |
| Sha256 |    Mono |  1000 |  14.924 us | 0.0574 us | 0.0479 us |  1.86 |
|        |         |       |            |           |           |       |
|    Md5 |     Clr |  1000 |   3.051 us | 0.0604 us | 0.0742 us |  1.00 |
|    Md5 |    Core |  1000 |   2.004 us | 0.0058 us | 0.0054 us |  0.66 |
|    Md5 |  CoreRT |  1000 |   1.892 us | 0.0087 us | 0.0077 us |  0.62 |
|    Md5 |    Mono |  1000 |   3.878 us | 0.0181 us | 0.0170 us |  1.27 |
|        |         |       |            |           |           |       |
| Sha256 |     Clr | 10000 |  75.780 us | 1.0445 us | 0.9771 us |  1.00 |
| Sha256 |    Core | 10000 |  41.134 us | 0.2185 us | 0.1937 us |  0.54 |
| Sha256 |  CoreRT | 10000 |  40.895 us | 0.0804 us | 0.0628 us |  0.54 |
| Sha256 |    Mono | 10000 | 141.377 us | 0.5598 us | 0.5236 us |  1.87 |
|        |         |       |            |           |           |       |
|    Md5 |     Clr | 10000 |  18.575 us | 0.0727 us | 0.0644 us |  1.00 |
|    Md5 |    Core | 10000 |  17.562 us | 0.0436 us | 0.0408 us |  0.95 |
|    Md5 |  CoreRT | 10000 |  17.447 us | 0.0293 us | 0.0244 us |  0.94 |
|    Md5 |    Mono | 10000 |  34.500 us | 0.1553 us | 0.1452 us |  1.86 |
You can find the full documentation for the latest version of BenchmarkDotNet on GitHub, so I’m not going to describe how to use all the features. Instead, I want to talk about the philosophy of tools for benchmarking. I think that a good benchmarking library should satisfy the following requirements:
  • It should do all routine tasks for you

    A typical benchmark includes a lot of boilerplate code. Users shouldn’t write it each time when they want to measure performance. A benchmarking tool should automatically run several iterations, and each iteration should include several method invocations. It should run several warm-up iterations and remove them from the report. It should isolate benchmarks from each other and run each benchmark in a separate process. If you want to check several different environments, it should automatically perform benchmarks in each environment and aggregate the results. It should automatically evaluate its own overhead and subtract it from the measured values. All the dirty work should be done by the benchmarking library. During benchmarking, users should be able to focus on the measured logic instead of the benchmarking infrastructure.

  • It should protect you from known pitfalls

    It shouldn’t allow you to run benchmarks in the DEBUG mode (without optimizations). It should control inlining and make sure that all benchmarks use the same inlining policy. It should use the best available timestamping API. The best benchmarking practices (like warm-up and isolation) should be enabled by default.

  • It should choose the best benchmarking mode for you

    Approaches of adaptive benchmarking should be implemented. Instead of asking the user about the number of iterations, it can use optional stopping. Instead of asking the user about the number of method invocations inside each iteration, it should find the best value during the pilot experiment. By default, users shouldn’t worry about infrastructure parameters: the library has to find the best possible values by default.

  • It should be highly configurable

    Each benchmark experiment is unique, with its own requirements. Users should be able to disable all the smart features. For example, if they want to measure the cold start, it should pe possible to disable warm-up. If they know that benchmarks don’t affect each other, they may want to disable the process isolation to speed up the whole experiment. It’s nice when somebody else chooses the number of iterations for you, but it should also be possible to set it manually.

  • It should have a user-friendly API

    This requirement is valid for any library. The API should be understandable and well documented. It should support different approaches: some users like to configure benchmarks in the command line, some users like to use attributes, some users like to use fluent API. The library should provide different ways to configure the benchmarking process.

  • It should know statistics for you

    The library should be able to calculate all the basic statistics characteristics like the mean and the median, the standard deviation and the confidence interval, the quartiles and the percentiles, and the skewness and the kurtosis. It should know how to detect outliers, how to perform statistical tests like Welch’s t-test or the Mann–Whitney U test, and how to check distributions for multimodality.

  • It should help you to analyze results

    If the library can calculate all possible statistical metrics, it doesn’t mean that it should print all of them each time. The library should highlight all the essential features of the calculated distribution. We know that we can get a huge difference between the mean and the median, but these values are often close to each other. If the library will print both values each time, users will learn to ignore one of them. It’s better to show only the mean by default and present the median only when it’s important. We know that it’s important to distinguish between unimodal and multimodal distributions. However, most simple performance distributions are unimodal. It doesn’t make sense to print “Everything is OK, the distribution is unimodal” each time. It’s better to print a warning in case the distribution is multimodal. It should tell you if the distribution is spoiled by outliers. The basic report should contain only important data in the most compact form. It’s great if it can calculate the mean value with the highest possible precision, but does it really make sense to print 6.38319573993657 ms? The most users care only about the most significant digits, so it will be enough to print just 6.383. The library can perform the Mann–Whitney U test and print the p-value, but it will be better to print a conclusion based on it (many users don’t remember how to correctly interpret p-values). The library should tell you when the results are unreliable because of the initial settings (e.g., small sample size or insufficient iteration time). The final summary table should be as small as possible but contain the most important numbers and facts. Users should be able to read it and quickly understand what’s going on with the data.

  • It should collect information about environment

    A good performance report should include the most important information about the environment like OS version, processor model, used runtime, JIT compiler kind, and so on.

  • It should provide basic diagnostics data

    A benchmarking library is not a profiler or a decompiler, but it can perform some basic diagnostics logic and provide the minimal diagnostics data. For example, it can measure the amount of allocated memory, evaluate values of hardware counters, print IL and native listings for the main methods, generate a trace file based on ETW events, check runtime optimizations like inlining or tail call optimizations, and so on. It should help users to understand why they have such a performance report and what kinds of additional tools they need.

  • It should generate many reports and draw plots

    The information about performed measurements should be available in different formats like CSV, JSON, XML, HTML, Markdown, AsciiDoc, and others. Developers often share their performance results, so the library should support different dialects of Markdown that can be posted to GitHub, StackOverflow, JIRA, or other services. The distribution should be shown with the help of different plots like histograms, timeline plots, density plots, bar plots, box plots, frequency trails, and so on. The library should know how to generate any kinds of report that can be useful during performance analysis.

BenchmarkDotNet has become popular because it tries to follow all these requirements. Of course, the library is not perfect; it has some bugs and missed features. However, BenchmarkDotNet gets better with each version thanks to community contributions.

You should understand that any benchmarking library (including BenchmarkDotNet) is not a silver bullet. It will not write a benchmark for you. It will not analyze benchmarking reports for you. It just helps to design and execute benchmarks. Thus, you still have to know the benchmarking methodology, and you still have to know about possible pitfalls. You still should know about JIT optimizations like DCE, BCE, and constant folding. You still should know about natural noise and possible huge variance; you should check the distribution manually, and you should know how to analyze it.

There is no magic library that solves all these problems for you: they are still your responsibility. BenchmarkDotNet just allows you to skip the boilerplate part of a benchmark and focus on the target code. It’s especially useful for beginners who don’t know about the discussed problems (or for people who just don’t want to think about all of that right then). The library does not guarantee that all your benchmarks are correct. But at least you do not have to worry about common stupid benchmark bugs. It’s a handy tool for bootstrapping benchmarks, so we will discuss it several times in this book.

URL: https://github.com/dotnet/benchmarkdotnet

Open source (MIT); free; cross-platform.

Resources: https://benchmarkdotnet.org/ , [Sitnik 2017a], [Sitnik 2017b], [Sitnik 2018].

Visual Studio Tools

Visual Studio is the most popular IDE for .NET development. We are not going to discuss Visual Studio as an IDE; we will talk only about a few features that can be useful during performance investigations.

URL: https://visualstudio.microsoft.com/vs/

Closed source; free/commercial; Windows-only.

EMBEDDED PROFILERS

Visual Studio has many different profiling modes:
  • CPU usage

  • Memory usage

  • Resource consumption for XAML

  • Network usage for UWP Apps

  • GPU usage for Direct3D

  • Energy usage for UWP Apps

A screenshot is presented in Figure 6-1.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig1_HTML.jpg
Figure 6-1.

Performance and memory profilers in Visual StudioURL: https://docs.microsoft.com/en-us/visualstudio/profiling

DISASSEMBLY VIEW

Visual Studio has several tool windows for low-level debugging:
  • Disassembly: a disassembly listing of a method.

  • Registers: plain text information about all available register values. It supports different groups of registers: CPU, CPU Segments, Floating Point, MMX, 3DNow!, SSE, AVX, AVX-512, MPX, Neon, Neon Float, Neon Double, and CPU flags.

  • Memory: several tool windows that show a dump of a specified segment of memory. It can interpret memory as 1/2/4/8-byte integers or 32/64-bit floating-point numbers and display them in different formats (hexadecimal, signed numbers, unsigned numbers).

All the tool windows can be found during debugging in the Debug→Windows menu.

By default, the debugger in Visual Studio suppresses some JIT optimizations to provide better debugging experience. Unfortunately, it spoils the native code even in the Release mode. If you want to get the real native code, you should disable the “Suppress JIT optimization on module load” check box in the settings.1

A screenshot is presented in Figure 6-2.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig2_HTML.jpg

JetBrains Tools

JetBrains has a suite of tools for .NET development. In this section, we are going to discuss some profiling, decompiling, and debugging features.

DOTPEEK

dotPeek is a free .NET decompiler and assembly browser. Here are some of the useful features:
  • Decompilation to C# and IL

  • Export decompiled code to Visual Studio projects and generation of pdb files

  • Find usages of any symbol

  • Quick navigation to a type, symbol, or anything else

A screenshot is presented in Figure 6-3.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig3_HTML.jpg
Figure 6-3.

dotPeekURL: www.jetbrains.com/decompiler/ Closed source; free; Windows-only

DOTTRACE AND DOTMEMORY

dotTrace and dotMemory are .NET performance and memory profilers. Here are some of the useful features of both products:
  • Support for various .NET applications

    It supports different kinds of .NET Framework applications (including desktop apps, IIS, IIS Express, Windows services, UWP, and so on) and .NET Core applications.

  • Rich visualizations

    Both profilers have a lot of visualization views, which allows you to investigate different kinds of issues. For example, dotMemory has the timeline view with real-time data collection, sunburst diagram, call tree chart, and many tree views that help to examine relations between objects in a snapshot.

  • Comparing snapshots

    When you want to evaluate the impact of a particular change, you can capture performance or memory snapshots before and after the change and compare them. It’s useful when you want to verify that the change fixes a performance problem (or that the change doesn’t introduce performance degradations).

  • Many execution options

    You can use dotTrace as a stand-alone desktop application, via command line, or via profiling API. You can attach to local or remote applications (remote profiling is especially useful when you have a problem in a web application on a server).

Here are some special features of dotTrace:
  • Different profiling modes

    dotTrace supports the following types of profiling:
    • Sampling. The idea of this approach is simple: the profiler at the call stacks for all threads from time to time. With this information, it can find methods that take too much time (because they will often appear in captured call stacks). This approach has the lowest possible overhead, but it’s not accurate: it can miss some fast methods and it can’t calculate the number of calls for each method. It’s useful when you want to find a performance bottleneck without significant profiler overhead.

    • Tracing. In the tracing mode, the profiler gets special entry and exit events for each method with the help of code instrumentation. As a result, it may add some overhead to each call; the measured time can be distorted. It’s useful when you want to know the exact number of calls for each method.

    • Line-by-line. This approach is similar to tracing, but it works with statements instead of methods. It has bigger overhead than tracing. It’s useful when you are looking for the slowest statement in a huge method.

    • Timeline. In the timeline mode, the profiler collects temporal information about call stacks, thread state data, memory allocation, garbage collections, and I/O operations. The results are presented with the help of the Timeline Viewer, which displays recorded events on a timeline diagram. It’s useful when the chronological order of events does matter; it allows detecting UI freezes, excessive GC and I/O operations, and lock contention.

  • Support for advanced cases

    dotTrace has a lot of additional features like profiling async calls, analyzing slow HTTP requests, SQL queries, and file system operations.

Here are some special features of dotMemory:
  • Powerful automatic inspections

    dotMemory automatically detects common memory issues in your snapshots like string duplicates, sparse arrays, leaking event handlers or WPF bindings, and others.

  • Support for raw memory dumps

    You can work with raw Windows memory dumps as regular snapshots, explore them via standard view panes, and apply inspections.

dotTrace 2018.3 and dotMemory 2018.3 are Windows-only applications, but future versions should support .NET Core and Mono profiling on Linux and macOS.

Screenshots of dotTrace and dotMemory are presented in Figure 6-4 and Figure 6-5.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig4_HTML.jpg
Figure 6-4.

 dotTrace

../images/437795_1_En_6_Chapter/437795_1_En_6_Fig5_HTML.jpg
Figure 6-5.

dotMemoryURL: www.jetbrains.com/profiler/ www.jetbrains.com/dotmemory/ Closed source; free/commercial; Windows-only

RESHARPER

ReSharper is a Visual Studio extension for .NET developers. It has many useful features, but I want to highlight only one: IL Viewer. It allows viewing IL code for the current file in a separate tool window. Thus, you can check out the generated IL listing without switching from Visual Studio to another program. ReSharper and dotPeek use the same decompilation engine.

A screenshot is presented in Figure 6-6.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig6_HTML.jpg
Figure 6-6.

ReSharper IL ViewerURL: www.jetbrains.com/resharper/ Closed source; free/commercial; Windows-only.Resources: [Balliauw 2017a], www.jetbrains.com/help/resharper/Viewing_Intermediate_Language.html

RIDER

Rider is a fast and powerful cross-platform .NET IDE. We are not going to discuss Rider as an IDE. Instead, we will talk only about the following features:
  • Embedded decompiler

    With the help of the dotPeek engine, Rider is able to show decompiled C# code for any third-party classes even without symbols.

  • External code debug

    Even if you are working with a simple console application, you can attach to any .NET application and debug the decompiled code of any class without original source code or symbols. You can even set breakpoints in the decompiled sources and analyze the execution of third-party assemblies. While most of the classic .NET tools are Windows-only, Rider supports external debug for Mono and .NET Core on Linux and macOS.

  • Embedded profiler

    Rider contains an embedded dotTrace engine, which allows profiling your application from the IDE.

    URL: www.jetbrains.com/rider/

    Closed source; free/commercial; cross-platform.

    Resources: [Balliauw 2017b], www.jetbrains.com/help/rider/Debugging_External_Code.html

Windows Sysinternals

Windows Sysinternals is a set of advanced system utilities for Windows. This suite includes many different tools that form the following groups:
  • File and Disk Utilities: tools that can obtain detailed information about disks (e.g., resource permissions, disk usage, disk mapping, information about encrypted files) and disk manipulation tools (e.g., scheduling file operations for the next reboot, defragmentation, working with symbolic links).

  • Networking Utilities: tools that can work with Active Directory, named pipes, sockets, and remote computers. It also includes PsPing, which allows performing basic network latency and bandwidth measurements.

  • Process Utilities: tools that can monitor and control processes, their threads, and handles.

  • Security Utilities: tools that can operate with users, their sessions and permissions.

  • System Information: tools that can collect different information about the operating system, processes, memory, devices, and hardware.

  • Miscellaneous: other tools that help to work with registry, encodings, screens, and desktops.

In this section, we are going to discuss a few tools that can be especially useful during performance investigations: RAMMap, VMMap, and Process Monitor.

URL: https://docs.microsoft.com/en-us/sysinternals/

Closed source; free; Windows-only.

RAMMAP

RAMMap shows a detailed low-level view of all kinds of memory in the operating system. It allows exploring different kinds of memory (Active, Standby, Modified, and so on) for different usage types (Process Private, Mapped Files, Sharable, and so on). You can analyze the memory of each process, physical memory pages, and ranges.

You can find more information about different kinds of memory in Windows in [Russinovich 2017] and [Russinovich 2019].

A screenshot is presented in Figure 6-7.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig7_HTML.jpg

VMMAP

VMMap shows a detailed low-level view of memory for a process. While RAMMap helps to explore memory in the whole operating system, VMMap is always working with a single process. It provides advanced data for all memory segments that are used by this process.

A screenshot is presented in Figure 6-8.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig8_HTML.jpg

PROCESS MONITOR

Process Monitor is an advanced monitoring tool for Windows that shows real-time file system, Registry, and process/thread activity. It allows viewing all kinds of low-level OS events (e.g., CreateFile/OpenFile/CloseFile, LoadImage, RegQueryKey/RegCloseKey, ThreadCreate/ThreadExit, and so on). It’s also possible to get all available metadata for each event, including full thread stack traces with integrated symbol support for each operation. Since Windows has a huge number of such events, Process Monitor allows setting different kinds of complicated filters, which helps you to catch only the events that you want to see.

A screenshot is presented in Figure 6-9.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig9_HTML.jpg

Other Useful Tools

In this section, we are going to discuss other useful tools from different vendors which can also simplify performance investigations.

ILDASM AND ILASM

ildasm allows getting IL disassembly for a .NET assembly and dumping it into a text file. It’s a companion tool to the ilasm, which builds a .NET assembly from the IL sources. Thus, you can decompile an assembly to IL with ildasm, make a few changes, and create a modified assembly with ilasm. Both tools are installed with Visual Studio and available from the Developer command prompt. Typical installation paths of these tools look like c:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.7.1 Tools\ildasm.exe and c:\Windows\Microsoft.NET\Framework\v4.0.30319\ilasm.exe.

Let’s say that we have a Program.cs file with the following content:
using System;
namespace ConsoleApp
{
  class Program
  {
    static void Main(string[] args)
    {
      Console.WriteLine("Hello World!");
    }
  }
}
Let’s compile it with the help of Roslyn:
csc Program.cs
Now we have the Program.exe assembly, which can be decompiled to IL:
ildasm.exe Program.exe /out:Program.il
This command creates Program.il with the full IL metadata of our assembly. In the middle of this file, we can find the following lines:
.class private auto ansi beforefieldinit ConsoleApp.Program
       extends [mscorlib]System.Object
{
  .method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       13 (0xd)
    .maxstack  8
    IL_0000:  nop
    IL_0001:  ldstr      "Hello World!"
    IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_000b:  nop
    IL_000c:  ret
  } // end of method Program::Main
  .method public hidebysig specialname rtspecialname
          instance void  .ctor() cil managed
  {
    // Code size       8 (0x8)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  nop
    IL_0007:  ret
  } // end of method Program::.ctor
} // end of class ConsoleApp.Program
Let’s open this file in a text editor and change IL_0001: ldstr "Hello World!" to IL_0001: ldstr "Modified" and compile it back to the executable file:
ilasm.exe Program.il

Now, if we execute Program.exe, we will get “Modified” instead of “Hello World!”.

This approach is especially powerful when you want to make a few changes in the assembly without rebuilding the project in the command line.

URL: https://docs.microsoft.com/en-us/dotnet/framework/tools/ildasm-exe-il-disassembler

Closed source; free; Windows-only

MONODIS

monodis is a Mono version of ildasm. It makes the preceding example with modification of IL code cross-platform. monodis prints the IL listing to the output, so we can rewrite ildasm.exe Program.exe /out:Program.il like this:
monodis Program.exe > Program.il

ilasm also exists in Mono (the title is the same).

URL: www.mono-project.com/docs/tools+libraries/tools/monodis/ Open source; free; cross-platform

ILSPY

ILSpy is a .NET assembly browser and decompiler. It’s a pretty simple decompiler, without many UI features. However, it allows using its decompilation engine via the ICSharpCode.Decompiler NuGet package. Thus, you can easily embed this decompiler into your own tools.

Originally, ILSpy was a Windows-only application, but now we have a cross-platform version based on Avalonia.2

A screenshot is presented in Figure 6-10.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig10_HTML.jpg
Figure 6-10.

ILSpyURL: https://github.com/icsharpcode/ILSpy https://github.com/icsharpcode/AvaloniaILSpy Open source (MIT); free; cross-platform

DNSPY

dnSpy is a debugger and .NET assembly editor. Here are some of its useful features:
  • Decompilation to C#, VB, and IL

  • Edit assemblies in C#/VB/IL and edit metadata

  • Debug .NET Framework, .NET Core, and Unity assemblies without source code

  • Powerful IL code hex editor

The decompilation engine is based on ILSpy and the compilation engine is based on Roslyn.

The most powerful feature of dnSpy is assembly editing: you can easily change any IL instruction in a third-party assembly even without its source code. It significantly simplifies experiments when you are trying to find a performance problem in one of your project dependencies. Even when you are working with your own assembly, dnSpy allows making minor code fixes without time-consuming solution recompilation.

A screenshot is presented in Figure 6-11.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig11_HTML.jpg
Figure 6-11.

dnSpyURL: https://github.com/0xd4d/dnSpy Open source (GPLv3); free; Windows-only.

WINDBG

WinDbg is the most powerful low-level debugger for Windows. It allows profiling native and .NET Windows applications. A rich set of commands helps to get any kind of information needed during debugging. The .loadby sos clr command loads a special WinDbg extension called SOS (Son of Strike) : it provides many additional commands for .NET applications. With WinDbg, you can examine all runtime objects, threads, call stacks, locks, and heaps; you can also explore managed and unmanaged memory, registers, and disassembly listings.

The classic version of WinDbg has a poor user interface, and it’s not easy to use it. Fortunately, there is a modern version of WinDbg with reworked UI, which is available via the Microsoft Store (see [Luhrs 2017]).

ASM-DUDE

Asm-Dude is an extension for Visual Studio 2015+ that improves the disassembly support. Here are some of the useful features:
  • Enhanced disassembly tool window

The extension applies syntax highlighting in the disassembly tool window and provides QuickInfo tooltips with detailed information about each assembly instruction and its performance characteristics.
  • ASM language supportYou can also get syntax highlighting, QuickInfo tooltips, code completion, code folding, signature help, and label analysis in the editor. It’s significantly simplifying the editing of assembly programs.

URL: https://github.com/HJLebbink/asm-dude

Open source (MIT), Free, Windows-only.

MONO CONSOLE TOOLS

Mono has several embedded tools that can be useful during investigations.

For example, Mono allows viewing the generated native code for any method. Let’s say we have the following program:
using System;
namespace MyApp
{
  class Program
  {
    static void Main()
    {
      int x = 3, y = 4;
      double z = Math.Sqrt(x ∗ x + y ∗ y);
      Console.WriteLine(z);
    }
  }
}
We can ask mono to compile this method without actual execution with the help of the following command on Linux/macOS:
$ MONO_VERBOSE_METHOD=MyApp.Program:Main mono
            --compile MyApp.Program:Main Program.exe
Here is the Windows version:
> SET MONO_VERBOSE_METHOD=MyApp.Program:Main
> mono --compile MyApp.Program:Main Program.exe
At the end of the command output, we will find an assembly listing like this:
0000000000000000  subq       0x8, %rsp
0000000000000004  movl       0x19, %eax
0000000000000009  cvtsi2sdl  eax, %xmm0
000000000000000d  movsd      xmm0, -0x8(%rsp)
0000000000000013  fldl       0x8(%rsp)
0000000000000017  fsqrt
0000000000000019  fstpl      -0x8(%rsp)
000000000000001d  movsd      -0x8(%rsp), %xmm0
0000000000000023  nop
0000000000000026  movabsq    $0x106f05fc8, %r11
0000000000000030  callq      ∗%r11
0000000000000033  addq       $0x8, %rsp
0000000000000037  retq
Also, mono allows running your program with the Mono log profiler:
$ mono --profile=log Program.exe

As a result, you will get the output.mlpd file , which can be opened via the mprof-report or Xamarin Profiler.3 The mono profiler has a lot of different options, which you can learn about in the official documentation.

URL: https://github.com/mono/mono/

Open source (MIT/BSD), free, cross-platform

Resources: www.mono-project.com/docs/ www.mono-project.com/docs/debug+profile/profile/profiler/

PERFVIEW

PerfView is a free performance analysis tool. It can collect ETW events and explore collected data. ETW is a built-in Windows mechanism (with special support for .NET applications) with extremely low overhead, which makes PerfView very useful for production system monitoring.

A screenshot is presented in Figure 6-13.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig13_HTML.jpg
Figure 6-13.

PerfViewURL: http://aka.ms/perfview Open source (MIT); free; Windows-only.Resources: [Goldshtein 2016a]

PERFCOLLECT

perfcollect is a bash script that automates performance measurements for .NET Core applications on Linux. The collected traces can be viewed using PerfView on Windows.

URL: http://aka.ms/perfcollect

Open source (MIT), free, Linux-only

Resources: [Kokosa 2017], [Goldshtein 2017], https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md

PROCESS HACKER

Process Hacker is a free, powerful, multipurpose tool that helps you monitor system resources, debug software, and detect malware. It’s an “advanced” version of the default Windows task manager. There is also a similar tool from the Sysinternals suite called Process Explorer.4

Process Hacker has a detailed view for each process with general statistics (CPU, Memory, I/O usage), performance charts, dozens of .NET performance metrics (like GC heap sizes, the number of jitted methods, the number of thrown exceptions, and so on), loaded .NET assemblies, information about threads (with stack traces), environment variables, tokens, modules, handles, and memory segments.

A screenshot is presented in Figure 6-14.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig14_HTML.jpg
Figure 6-14.

Process HackerURL: https://github.com/processhacker/processhacker Open source (GPLv3); free; Windows-only

INTEL VTUNE AMPLIFIER

Intel VTune Amplifier is an advanced general-purpose profiler. It knows about hundreds of hardware counters that are supported by Intel CPUs. In especially complicated performance investigations, it’s almost impossible to make any conclusions without these counters.

VTune has a lot of different profiling modes for different use cases from four groups: “Hotspots,” “Microarchitecture,” “Parallelism,” and “Platform Analysis.” Each mode is highly configurable: the many different settings allow customizing your profile session and getting only metrics that you really need. One of my favorite modes is “Microarchitecture Exploration”: it allows getting a lot of different hardware counters that are not available in other profilers.

It has advanced support for different languages like C, C++, C#, Fortran, Java, Python, Go, and Assembly. VTune 2019+ has advanced support for .NET Core applications.

A screenshot is presented in Figure 6-15.
../images/437795_1_En_6_Chapter/437795_1_En_6_Fig15_HTML.jpg
Figure 6-15.

Intel VTune AmplifierURL: https://software.intel.com/en-us/vtune Closed source; commercial; cross-platform.Resources: [Lander 2018]

Summary

In this chapter, we briefly discussed different diagnostic tools that can be useful during performance investigations:
  • Benchmarking harness: BenchmarkDotNet

  • Performance profiler: Visual Studio embedded profiler, Rider embedded profiler, dotTrace, Intel VTune Amplifier, Mono Console Tools, perfcollect with PerfView

  • Memory profiler: Visual Studio embedded profiler, dotMemory, Intel VTune Amplifier, VMMap, Mono Console Tools

  • C#/VB decompiler: ILSpy, dnSpy, dotPeek, Rider, ReSharper

  • IL decompiler: ildasm, monodis, ILSpy, dnSpy, dotPeek, ReSharper (via IL Viewer), Intel VTune Amplifier, BenchmarkDotNet (via DisassemblyDiagnoser)

  • ASM Decompiler: Visual Studio disassembly view (which is more powerful with Asm-Dude), WinDbg, BenchmarkDotNet (via DisassemblyDiagnoser), Mono Console Tools

  • Debuggers: Visual Studio embedded debugger, Rider embedded debugger, WinDbg

  • System monitoring tool: Process Hacker, RAMMap, Process Monitor

A good benchmark answers questions like “How long does this method take?”, but it doesn’t answer questions like “Why does this method take so long?”. A full performance investigation often involves additional tools that help to diagnose applications and make meaningful conclusions.

Of course, this not a complete list of available tools; you can easily find more of them on the Internet. I described only those tools that I typically use. You are free to choose any tools you like.

In this chapter, the following tool versions were used: BenchmarkDotNet v0.11.3 Visual Studio 2017 (15.9), dotPeek/dotTrace/dotMemory/ReSharper/Rider 2018.3, RAMMap 1.51, VMMap 3.25, Process Monitor 3.50, ildasm 4.0.30319.0, ILSpy 4.0 Beta 2, dnSPy 5.0.10, WinDbg Preview 1.0.1812.12001, PerfVew 2.0.26, Asm-Dude 1.9.5.3, Mono 5.16, ProcessHacker 3.0.1563, Intel VTune Amplifier 2019 Update 2. Updated versions of these tools can include changes in the feature set and license policy.

References

[Balliauw 2017a] Balliauw, Maarten. 2017. “Exploring Intermediate Language (IL) with ReSharper and dotPeek.” January 19. https://blog.jetbrains.com/dotnet/2017/01/19/exploring-intermediate-language-il-with-resharper-and-dotpeek/ .

[Balliauw 2017b] Balliauw, Maarten. 2017. “Debugging Third-Party Code with Rider.” December 20. https://blog.jetbrains.com/dotnet/2017/12/20/debugging-third-party-code-rider/ .

[Goldshtein 2016a] Goldshtein, Sasha. 2016. “PerfView: Measure and Improve Your App’s Performance for Free.” Presented at DotNext Piter 2016, June 3. www.youtube.com/watch?v=eX644hod65s .

[Goldshtein 2016b] Goldshtein, Sasha. 2016. “WinDbg Superpowers for .NET Developers.” Presented at DotNext Moscow 2016, December 9. www.youtube.com/watch?v=8t1aTbnZ2CE .

[Goldshtein 2017] Goldshtein, Sasha. 2017. “Profiling a .NET Core Application on Linux.” February 27. http://blogs.microsoft.co.il/sasha/2017/02/27/profiling-a-net-core-application-on-linux/ .

[Kokosa 2017] Kokosa, Konrad. 2017. “Analyzing Runtime CoreCLR Events from Linux – Trace Compass.” August 7. http://tooslowexception.com/analyzing-runtime-coreclr-events-from-linux-trace-compass/ .

[Lander 2018] Lander, Rich. 2018.“NET Core Source Code Analysis with Intel® VTune™ Amplifier.” Microsoft .NET Blog. October 23. https://blogs.msdn.microsoft.com/dotnet/2018/10/23/net-core-source-code-analysis-with-intel-vtune-amplifier/ .

[Luhrs 2017] Luhrs, Andy. 2017. “New WinDbg Available in Preview!” August 28. https://blogs.msdn.microsoft.com/windbg/2017/08/28/new-windbg-available-in-preview/ .

[Russinovich 2017] Yosifovich, Pavel, Mark E. Russinovich, David A. Solomon, and Alex Ionescu. 2017. Windows Internals, Part 1. 7th ed. Microsoft Press.

[Russinovich 2019] Russinovich, Mark E., David A. Solomon, Alex Ionescu, and Andrea Allievi. 2019. Windows Internals, Part 2. 7th ed. Microsoft Press.

[Sitnik 2017a] Sitnik, Adam. 2017. “Collecting Hardware Performance Counters with BenchmarkDotNet.” April 4. https://adamsitnik.com/Hardware-Counters-Diagnoser/ .

[Sitnik 2017b] Sitnik, Adam. 2017. “Disassembling .NET Code with BenchmarkDotNet.” August 16. https://adamsitnik.com/Disassembly-Diagnoser/ .

[Sitnik 2018] Sitnik, Adam. 2018. “Profiling .NET Code with BenchmarkDotNet.” September 28. https://adamsitnik.com/ETW-Profiler/ .