It was going to happen sooner or later – our research on C# dynamic features eventually ended up with an attepmt to parse bits of source code. There are quite a few solutions on the market, with NRefactory being our preferred tool over the years. There are however a few limitations: it does not support .net core and C# 6.

It is a big deal

It might seem, that support for newer language spec is not critical. But in fact, it gets problematic very quickly even in more established projects. Luckily for us, Microsoft has chosen to open source Roslyn – the very engine that powers their compiler services. Their official documentation covers the platform pretty well, and goes in great detail of writing Visual Studio code analysers. We however often have to deal with writing MSBuild tasks that load the whole solution and run analysis on class hierarchies (for example, to detect whether a single select statement is being called inside a foreach loop – we would fail the build and suggest to replace it with bulk select)

Installing

Roslyn is available via NuGet as a number of Microsoft.CodeAnalysis.* packages. We normally include these four:

Install-Package Microsoft.CodeAnalysis.Workspaces.MSBuild
Install-Package Microsoft.CodeAnalysis
Install-Package Microsoft.CodeAnalysis.CSharp
Install-Package Microsoft.Build.Locator # this is a helper to locate correct MSBuild toolchain (in case the machine has more than one installed)

Sometimes the environment gets confused as to what version MSBuild to use, and this is why starting a project with something like this is pretty much a must since VS2015:

if (!MSBuildLocator.IsRegistered) MSBuildLocator.RegisterDefaults(); // ensures correct version is loaded up
var _ = typeof(Microsoft.CodeAnalysis.CSharp.Formatting.CSharpFormattingOptions); // this ensures library is referenced so the compiler would not try to optimise it away (if dynamically loading assemblies or doing other voodoo that can throw the compiler off) - probably less important than the above but we prefer to follow cargo cult here and leave it be for 

After initial steps, simplistic solution traversal would look something along these lines:

async Task AnalyseSolution()
{
	using (var w = MSBuildWorkspace.Create())
	{
		var solution = await w.OpenSolutionAsync(@"MySolution.sln");		
		foreach (var project in solution.Projects)
		{			
			var docs = project.Documents; // allows for file-level document filtering
			var compilation = await project.GetCompilationAsync(); // allows for assembly-level analysis as well as SemanticModel 
			foreach (var doc in docs)
			{
				var walker = new CSharpSyntaxWalker(); // CSharpSyntaxWalker is an abstract class - we will need to define our own implementation for this to actually work
				walker.Visit(await doc.GetSyntaxRootAsync()); // traverse the syntax tree
			}
		}
	}
}

Syntax Tree Visitor

As with pretty much every single mainstream syntax analyser, the easiest way to traverse syntax trees is by using a Visitor Pattern. It allows to decouple tree nodes and processing logic. Which will allow room for expansion on either sides (easy to add new logic, easy to add new tree node types). Roslyn has stub CSharpSyntaxWalker that allows us to only override required nodes for processing. Everything else is then taken care of.

With basics out of the way, let’s look into classes that make up our platform here. Top of the hierarchy is MSBuild Workspace followed by Solution, Project and Document. Roslyn makes a distinction between parsing code and compiling it. Meaning some analytics will only be available in Compilation class that is available for project as well as for individual documents down the track.

Traversing the tree

Just loading the solution is kinda pointless though. We’d need to come up with processing logic – and the best place to do it would be a CSharpSyntaxWalker subclass. Suppose, we’d like to determine whether class constructor contains if statements that are driven by parameters. This might mean we’ve got overly complicated classes and could benefit from refactoring these out:

public class ConstructorSyntaxWalker : CSharpSyntaxWalker
{
	public List<ISymbol> Parameters { get; set; }
	public int IfConditions { get; set; }

	SemanticModel sm;

	public ConstructorSyntaxWalker(SemanticModel sm)
	{
		this.sm = sm;
		Parameters = new List<ISymbol>();
	}

	public override void VisitIfStatement(IfStatementSyntax node)
	{
		Parameters.AddRange(sm.AnalyzeDataFlow(node).DataFlowsIn); // .AnalyzeDataFlow() is one of the most commonly used parts of the platform: it requires a compilation to work off and allows tracking dependencies. We could then check if these parameters are supplied to constructor and make a call whether this is allowed 
		IfConditions++; // just count for now, nothing fancy
		base.VisitIfStatement(node);
	}
}

One reply on “Walking code with Roslyn”

Comments are closed.