T-SQL syntax analysis

It is nice to be able to analyze the code for either code review automation or rule enforcement. While Roslyn is an extremely useful platform for C# code analysis, more often than not, .net applications come backed by SQL Server. And this means one thing…

We need Roslyn for SQL!

We probably are not getting full blown SQL dependency tracking system just yet. But apparently Microsoft does offer a SqlParser library that is very capable of handling most of static code analysis on SQL. Let us run through a reasonably common scenario. Developers rely on feature flags for all new functionality. Product Owner explicitly enables features are they become ready. States are kept in the database and managed via simple scripts. Since we release code multiple times a day, unfinished features can ship to production as long as respective flags are safely turned off. Now our task is to ensure that change scripts only create feature flag definitions but never turn them on by default.

A bit of theory

Parsing language (usually) happens in stages:

  1. Lexical analysis (convert character stream to basic building blocks – tokens)
  2. Syntactic analysis (group tokens into constructs according to selected grammar – parse tree)
  3. Semantic analysis (convert parse tree into abstract syntax tree)

Once we’ve got the Abstract Syntax Tree (AST), we can interact with it and perform certain operations depending on node type and context. Probably the easiest way to traverse such tree would be to implement a Visitor. Both Roslyn and SQL Parser offer stubs to aid implementing this pattern.

Sample scripts

This particular example revolves around two workflows:

-- case 1 - definŠµ new feature flag
INSERT INTO dbo.FeatureFlag ( [Key], Value ) VALUES ( @featureFlagId, N'true')
-- case 2 - update existing flag
UPDATE dbo.FeatureFlag SET Value = 'true' WHERE [Key] = @featureFlagId -- enable/disable flag state

Let’s imagine we’ve got a directory where developers put new scripts. We need to scan it and figure out whether any scripts attempt to update the Value in our table:

using Microsoft.SqlServer.Management.SqlParser.Parser;
......	
	foreach (var file in Directory.EnumerateFiles(@"C:\ChangeScriptsFolder"))
	{
		var result = Parser.Parse(File.ReadAllText(file)); // parse SQL. That's all code we need to get an Abstract Syntax Tree
		var visitor = new FeatureFlagStateVisitor(file); // prep our visitor
		result.Script.Accept(visitor); // run
		if (!visitor.Result.IsSuccess)
		{
			string errorList = visitor.Result.Errors.Aggregate(new StringBuilder(), (sb, error) => sb.AppendLine(error)).ToString();
			throw new Exception($"FeatureFlag state must be disabled on build. Sql text analysis indicates following issues: {errorList}");
		}
	}
.......

And the visitor itself can look something like this:

    public class SqlAnalysisResult
    {
        public bool IsSuccess { get; set; }
        public IList<string> Errors { get; set; }
    }
    public class FeatureFlagStateVisitorContext
    {
        public bool IsFtUpdating { get; set; }
        public bool IsFtInserting { get; set; }
        public string StatementLocation { get; set; }
        public int InsertColumnRef { get; set; }
    }
    public class FeatureFlagStateVisitor: SqlCodeObjectRecursiveVisitor
    {
        public SqlAnalysisResult Result { get; set; }
        private readonly FeatureFlagStateVisitorContext _context;
        private readonly string _fileName;

        public FeatureFlagStateVisitor(string fileName)
        {
            Result = new SqlAnalysisResult
            {
                Errors = new List<string>(),
                IsSuccess = true
            };
            _context = new FeatureFlagStateVisitorContext();
            _fileName = fileName;
        }

        public override void Visit(SqlInsertSpecification codeObject)
        {
            if (codeObject.Target.Sql.CaseInsensitiveContains("FeatureFlag"))
            {
                _context.InsertColumnRef = codeObject.TargetColumns.IndexOf(codeObject.TargetColumns.FirstOrDefault(c => c.ColumnName.Sql.Contains("Value")));
                if (_context.InsertColumnRef >= 0)
                {
                    _context.IsFtInserting = true;
                    _context.StatementLocation = $"L{codeObject.StartLocation.LineNumber}:{codeObject.StartLocation.Offset} - L{codeObject.EndLocation.LineNumber}:{codeObject.EndLocation.Offset}";
                }
            }
            base.Visit(codeObject);
            _context.IsFtInserting = false;
        }

        public override void Visit(SqlRowConstructorExpression codeObject)
        {
            if (_context.IsFtInserting && codeObject.Values[_context.InsertColumnRef].Sql.CaseInsensitiveContains("true"))
            {
                Result.IsSuccess = false;
                Result.Errors.Add($"INSERT {_fileName} - {_context.StatementLocation}");
            }
            base.Visit(codeObject);
        }

        public override void Visit(SqlUpdateSpecification codeObject)
        {
            if (codeObject.Target.Sql.CaseInsensitiveContains("FeatureFlag"))
            {
                _context.IsFtUpdating = true;
                _context.StatementLocation = $"L{codeObject.StartLocation.LineNumber}:{codeObject.StartLocation.Offset} - L{codeObject.EndLocation.LineNumber}:{codeObject.EndLocation.Offset}";
            }
            base.Visit(codeObject);
            _context.IsFtUpdating = false;
        }

        public override void Visit(SqlColumnAssignment codeObject)
        {
            if (_context.IsFtUpdating && codeObject.Column.Sql.CaseInsensitiveContains("Value") && codeObject.Value.Sql.CaseInsensitiveContains("true"))
            {
                Result.IsSuccess = false;
                Result.Errors.Add($"UPDATE {_fileName} - {_context.StatementLocation}");
            }
            base.Visit(codeObject);
        }
    }

The idea is pretty simple – we keep track where we are at the moment and refer back to this context when making final decision on SqlColumnAssignment/SqlRowConstructorExpression level.