Getting started with Roslyn code analysis

It was going to happen eventually – our research on C# dynamic features eventually ended up with an attempt to parse bits of source code. There are quite a few solutions on the market, with NRefactory being our preferred tool over the years. There are however a few limitations: it does not support .NET core and C# 6.

It is a big deal

It might seem, that support for newer language spec is not critical. But in fact, it gets problematic very quickly even in more established projects. Luckily for us, Microsoft has chosen to open source Roslyn – the very engine that powers their compiler services. Their official documentation covers the platform pretty well and goes in great detail of writing Visual Studio code analysers. We however often have to deal with writing MSBuild tasks that load the whole solution and run analysis on class hierarchies (for example, to detect whether a single `SQL SELECT` statement is being called inside a foreach loop – we would fail the build and suggest to replace it with bulk select)

Installing

Roslyn is available via NuGet as a number of Microsoft.CodeAnalysis.* packages. We normally include these four:

Install-Package Microsoft.CodeAnalysis.Workspaces.MSBuild
Install-Package Microsoft.CodeAnalysis
Install-Package Microsoft.CodeAnalysis.CSharp
Install-Package Microsoft.Build # these classes are needed to support MSBuild workspace when it starts to load solution
Install-Package Microsoft.Build.Utilities.Core # these classes are needed to support MSBuild workspace when it starts to load solution
Install-Package Microsoft.Build.Locator # this is a helper to locate correct MSBuild toolchain (in case the machine has more than one installed)

Sometimes the environment gets confused as to what version MSBuild to use, and this is why starting a project with something like this is pretty much a must since VS2015:

// put this somewhere early in the program
if (!MSBuildLocator.IsRegistered) //MSBuildLocator.RegisterDefaults(); // ensures correct version is loaded up
{
    var vs2022 = MSBuildLocator.QueryVisualStudioInstances().Where(x => x.Name == "Visual Studio Community 2022").First(); // find the correct VS setup. There are namy ways to organise logic here, we'll just assume we want VS2022
    MSBuildLocator.RegisterInstance(vs2022); // register the selected instance
    var _ = typeof(Microsoft.CodeAnalysis.CSharp.Formatting.CSharpFormattingOptions); // this ensures library is referenced so the compiler would not try to optimise it away (if dynamically loading assemblies or doing other voodoo that can throw the compiler off) - probably less important than the above but we prefer to follow cargo cult here and leave it be
}

After initial steps, simplistic solution traversal would look something along these lines:

async Task AnalyseSolution()
{
	using (var w = MSBuildWorkspace.Create())
	{
		var solution = await w.OpenSolutionAsync(@"MySolution.sln");		
		foreach (var project in solution.Projects)
		{			
			var docs = project.Documents; // allows for file-level document filtering
			var compilation = await project.GetCompilationAsync(); // allows for assembly-level analysis as well as SemanticModel 
			foreach (var doc in docs)
			{
				var walker = new CSharpSyntaxWalker(); // CSharpSyntaxWalker is an abstract class - we will need to define our own implementation for this to actually work
				walker.Visit(await doc.GetSyntaxRootAsync()); // traverse the syntax tree
			}
		}
	}
}

Syntax Tree Visitor

As with pretty much every single mainstream syntax analyser, the easiest way to traverse syntax trees is by using a Visitor Pattern. It allows to decouple tree nodes and processing logic. Which will allow room for expansion on either side (easy to add new logic, easy to add new tree node types). Roslyn has stub CSharpSyntaxWalker that allows us to only override required nodes for processing. It then takes care of everything else.

With basics out of the way, let’s look into classes that make up our platform here. Top of the hierarchy is MSBuild Workspace followed by Solution, Project and Document. Roslyn makes a distinction between parsing code and compiling it. Meaning some analytics will only be available in Compilation class that is available for project as well as for individual documents down the track.

Traversing the tree

Just loading the solution is kind of pointless though. We’d need to come up with processing logic – and the best place to do it would be a CSharpSyntaxWalker subclass. Suppose, we’d like to determine whether class constructor contains if statements that are driven by parameters. This might mean we’ve got overly complex classes and could benefit from refactoring these out:

public class ConstructorSyntaxWalker : CSharpSyntaxWalker
{
    public List<IParameterSymbol> Parameters { get; set; }
    public int IfConditions { get; set; }
    
    bool processingConstructor = false;

    SemanticModel sm;

    public ConstructorSyntaxWalker(SemanticModel sm)
    {
        this.sm = sm;
        Parameters = new List<IParameterSymbol>();
    }

    public override void VisitConstructorDeclaration(ConstructorDeclarationSyntax node)
    {
        processingConstructor = true;
        base.VisitConstructorDeclaration(node);
        processingConstructor = false;
    }

    public override void VisitIfStatement(IfStatementSyntax node)
    {
        if (!processingConstructor) return; // we only want to keep traversing if we know we're inside constructor body
        Parameters.AddRange(sm.AnalyzeDataFlow(node).DataFlowsIn.Cast<IParameterSymbol>()); // .AnalyzeDataFlow() is one of the most commonly used parts of the platform: it requires a compilation to work off and allows tracking dependencies. We could then check if these parameters are supplied to constructor and make a call whether this is allowed 
        IfConditions++; // just count for now, nothing fancy
        base.VisitIfStatement(node);
    }
}

Then, somewhere in our solution (or any other solution, really!) We have a class definition like so:

public class TestClass
{
    public TestClass(int a, string o) 
    {
        if (a == 1) DoThis() else DoSomethingElse();
        if (o == "a") Foo() else Bar();
    }
}

If we wanted to throw an exception and halt the build we could invoke out SyntaxWalker:

public static async Task Main()
{
    await AnalyseSolution();
}
...
async static Task AnalyseSolution()
{

    using (var w = MSBuildWorkspace.Create())
    {
        var solution = await w.OpenSolutionAsync(@"..\..\..\TestRoslyn.sln"); // let's analyse our own solution. But can be any file on disk
        foreach (var project in solution.Projects)
        {
            var docs = project.Documents; // allows for file-level document filtering
            var compilation = await project.GetCompilationAsync(); // allows for assembly-level analysis as well as SemanticModel 
            foreach (var doc in docs)
            {
                var walker = new ConstructorSyntaxWalker(await doc.GetSemanticModelAsync());
                walker.Visit(await doc.GetSyntaxRootAsync()); // traverse the syntax tree
                if (walker.IfConditions > 0 && walker.Parameters.Any()) throw new Exception("We do not allow branching in constructors.");
            }
        }
    }
}

And there we have it. This is a very simplistic example, but possibilities are endless!

One thought on “Getting started with Roslyn code analysis”

Comments are closed.