Parsing OData queries

OData (Open Data Protocol) is an ISO approved standard that defines a set of best practices for building and consuming RESTful APIs. It allows us write business logic and not worry too much about request and response headers, status codes, HTTP methods, and other variables.

We won’t go into too much detail on how to write OData queries and how to use it – there’s plenty resources out there. We’ll rather have a look at a bit esoteric scenario where we consider defining our own parser and then walking the AST to get desired values.

Problem statement

Suppose we’ve got a filter string that we received from the client:

"?$filter =((Name eq 'John' or Name eq 'Peter') and (Department eq 'Professional Services'))"

And we’d like to apply custom validation to the filter. Ideally we’d like to get a structured list of properties and values so we can run our checks:

Filter 1:
    Key: Name
    Operator: eq
    Value: John
Operator: or

Filter 2:
    Key: Name
    Operator: eq
    Value: Peter

Operator: and

Filter 3:
    Key: Department
    Operator: eq
    Value: Professional Services

Some options are:

  • ODataUriParser – but it seems to have some issues with .net Core support just yet
  • Regular Expression – not very flexible
  • ODataQueryOptions – produces raw text but cannot broken down any further

What else?

One other way to approach this would be parsing. And there are plenty tools to do that (see flex or bison for example). In .net world, however, Irony might be a viable option: it’s available in .net standard 2.0 which we had no issues plugging into a .net core 3.1 console test project.

Grammar

To start off, we normally need to define a grammar. But luckily, Microsoft have been kind enough to supply us with EBNF reference so all we have to do is to adapt it to Irony. I ended up implementing a subset of the grammar above that seems to cater for example statement (and a bit above and beyond, feel free to cut it down).

using Irony.Parsing;

namespace irony_playground
{
    [Language("OData", "1.0", "OData Filter")]
    public class OData: Grammar
    {
        public OData()
        {
            // first we define some terms
            var identifier = new RegexBasedTerminal("identifier", "[a-zA-Z_][a-zA-Z_0-9]*");
            var string_literal = new StringLiteral("string_literal", "'");
            var integer_literal = new NumberLiteral("integer_literal", NumberOptions.IntOnly);
            var float_literal = new NumberLiteral("float_literal", NumberOptions.AllowSign|NumberOptions.AllowSign) 
                                        | new RegexBasedTerminal("float_literal", "(NaN)|-?(INF)");
            var boolean_literal = new RegexBasedTerminal("boolean_literal", "(true)|(false)");

            var filter_expression = new NonTerminal("filter_expression");
            var boolean_expression = new NonTerminal("boolean_expression");
            var collection_filter_expression = new NonTerminal("collection_filter_expression");
            var logical_expression = new NonTerminal("logical_expression");
            var comparison_expression = new NonTerminal("comparison_expression");
            var variable = new NonTerminal("variable");
            var field_path = new NonTerminal("field_path");
            var lambda_expression = new NonTerminal("lambda_expression");
            var comparison_operator = new NonTerminal("comparison_operator");
            var constant = new NonTerminal("constant");

            Root = filter_expression; // this is where our entry point will be. 

            // and from here on we expand on all terms and their relationships
            filter_expression.Rule = boolean_expression;

            boolean_expression.Rule = collection_filter_expression
                                      | logical_expression
                                      | comparison_expression
                                      | boolean_literal
                                      | "(" + boolean_expression + ")"
                                      | variable;
            variable.Rule = identifier | field_path;

            field_path.Rule = MakeStarRule(field_path, ToTerm("/"), identifier);

            collection_filter_expression.Rule =
                field_path + "/all(" + lambda_expression + ")"
                | field_path + "/any(" + lambda_expression + ")"
                | field_path + "/any()";

            lambda_expression.Rule = identifier + ":" + boolean_expression;

            logical_expression.Rule =
                boolean_expression + (ToTerm("and", "and") | ToTerm("or", "or")) + boolean_expression
                | ToTerm("not", "not") + boolean_expression;

            comparison_expression.Rule =
                variable + comparison_operator + constant |
                constant + comparison_operator + variable;

            constant.Rule =
                string_literal
                | integer_literal
                | float_literal
                | boolean_literal
                | ToTerm("null");

            comparison_operator.Rule = ToTerm("gt") | "lt" | "ge" | "le" | "eq" | "ne";

            RegisterBracePair("(", ")");
        }
    }
}

NB: Irony comes with Grammar Explorer tool that allows us to load grammar dlls and debug them with free text input.

enter image description here

after we’re happy with the grammar, we need to reference it from our project and parse the input string:

class Program
{
    static void Main(string[] args)
    {
        var g = new OData();
        var l = new LanguageData(g);
        var r = new Parser(l);
        var p = r.Parse("((Name eq 'John' or Name eq 'Grace Paul') and (Department eq 'Finance and Accounting'))"); // here's your tree
        // this is where you walk it and extract whatever data you desire 
    }
}

Then, all we’ve got to do is walk the resulting tree and apply any custom logic based on syntax node type. One example how to do that can be found in this StackOverflow answer.

Entity Framework Core 3.1 – dynamic WHERE clause

Every now and then we get tasked with building a backend for filtering arbitrary queries. Usually clients would like to have a method of sending over an array of fields, values, and comparisons operations in order to retrieve their data. For simplicity we’ll assume that all conditions are joining each other with an AND operator.

public class QueryableFilter {
    public string Name { get; set; }
    public string Value { get; set; }
    public QueryableFilterCompareEnum? Compare { get; set; }
}

With a twist

There’s however one slight complication to this problem – filters must apply to fields on dependent entities (possible multiple levels of nesting as well). This can become a problem not only because we’d have to traverse model hierarchy (we’ll touch on that later), but also because of ambiguity this requirement introduces. Sometimes we’re lucky to only have unique column names across the hierarchy. However more often than not this needs to be resolved one way or another. We can, for example, require filter fields to use dot notation so we know which entity each field relates to. For example, Name -eq "ACME Ltd" AND Name -eq "Cloud Solutions" becomes company.Name -eq "ACME Ltd" AND team.Name -eq "Cloud Solutions"

Building an expression

It is pretty common that clients already have some sort of data querying service with EF Core doing the actual database comms. And since EF relies on LINQ Expressions a lot – we can build required filters dynamically.

public static IQueryable<T> BuildExpression<T>(this IQueryable<T> source, DbContext context, string columnName, string value, QueryableFilterCompareEnum? compare = QueryableFilterCompareEnum.Equal)
{
	var param = Expression.Parameter(typeof(T));

	// Get the field/column from the Entity that matches the supplied columnName value
	// If the field/column does not exists on the Entity, throw an exception; There is nothing more that can be done
	MemberExpression dataField;
	
	var model = context.Model.FindEntityType(typeof(T)); // start with our own entity
	var props = model.GetPropertyAccessors(param); // get all available field names including navigations
	var reference = props.First(p => RelationalPropertyExtensions.GetColumnName(p.Item1) == columnName); // find the filtered column - you might need to handle cases where column does not exist

	dataField = reference.Item2 as MemberExpression; // we happen to already have correct property accessors in our Tuples	

	ConstantExpression constant = !string.IsNullOrWhiteSpace(value)
		? Expression.Constant(value.Trim(), typeof(string))
		: Expression.Constant(value, typeof(string));

	BinaryExpression binary = GetBinaryExpression(dataField, constant, compare);
	Expression<Func<T, bool>> lambda = (Expression<Func<T, bool>>)Expression.Lambda(binary, param);
	return source.Where(lambda);
}

Most of the code above is pretty standard for building property accessor lambdas, but GetPropertyAccessors is the key:

private static IEnumerable<Tuple<IProperty, Expression>> GetPropertyAccessors(this IEntityType model, Expression param)
{
	var result = new List<Tuple<IProperty, Expression>>();

	result.AddRange(model.GetProperties()
								.Where(p => !p.IsShadowProperty()) // this is your chance to ensure property is actually declared on the type before you attempt building Expression
								.Select(p => new Tuple<IProperty, Expression>(p, Expression.Property(param, p.Name)))); // Tuple is a bit clunky but hopefully conveys the idea

	foreach (var nav in model.GetNavigations().Where(p => p is Navigation))
	{
		var parentAccessor = Expression.Property(param, nav.Name); // define a starting point so following properties would hang off there
		result.AddRange(GetPropertyAccessors(nav.ForeignKey.PrincipalEntityType, parentAccessor)); //recursively call ourselves to travel up the navigation hierarchy
	}

	return result;
}

this is where we interrogate EF as-built data model, traverse navigation properties and recursively build a list of all properties we can ever filter on!

Testing it out

Talk is cheap, let’s run a complete example here:

public class Entity
{
	public int Id { get; set; }
}
class Company : Entity
{
	public string CompanyName { get; set; }
}

class Team : Entity
{
	public string TeamName { get; set; }
	public Company Company { get; set; }
}

class Employee : Entity
{
	public string EmployeeName { get; set; }
	public Team Team { get; set; }
}

class DynamicFilters<T> where T : Entity
{
	private readonly DbContext _context;

	public DynamicFilters(DbContext context)
	{
		_context = context;
	}

	public IEnumerable<T> Filter(IEnumerable<QueryableFilter> queryableFilters = null)
	{
		IQueryable<T> mainQuery = _context.Set<T>().AsQueryable().AsNoTracking();
		// Loop through the supplied queryable filters (if any) to construct a dynamic LINQ-to-SQL queryable
		foreach (var filter in queryableFilters ?? new List<QueryableFilter>())
		{
			mainQuery = mainQuery.BuildExpression(_context, filter.Name, filter.Value, filter.Compare);
		}

		mainQuery = mainQuery.OrderBy(x => x.Id);

		return mainQuery.ToList();
	}
}
// --- DbContext
class MyDbContext : DbContext
{
	public DbSet<Company> Companies { get; set; }
	public DbSet<Team> Teams { get; set; }
	public DbSet<Employee> Employees { get; set; }

	protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
	{
		optionsBuilder.UseSqlServer("Server=.\\SQLEXPRESS;Database=test;Trusted_Connection=true");
		base.OnConfiguring(optionsBuilder);
	}
}
// ---
static void Main(string[] args)
{
	var context = new MyDbContext();
	var someTableData = new DynamicFilters<Employee>(context).Filter(new
	List<QueryableFilter> { new QueryableFilter { Name = "CompanyName", Value = "ACME Ltd" }, new QueryableFilter { Name = "TeamName", Value = "Cloud Solutions" } });
}

The above block should produce following SQL:

SELECT [e].[Id], [e].[EmployeeName], [e].[TeamId]
FROM [Employees] AS [e]
LEFT JOIN [Teams] AS [t] ON [e].[TeamId] = [t].[Id]
LEFT JOIN [Companies] AS [c] ON [t].[CompanyId] = [c].[Id]
WHERE [c].[CompanyName] = N'ACME Ltd'
 AND [t].[TeamName] = N'Cloud Solutions'
ORDER BY [e].[Id]

Entity Framework Core 3.1 – Peeking Into Generated SQL

Writing LINQ that produces optimal SQL can be even harder as developers often don’t have visibility into the process. It becomes even more confusing when the application is designed to run against different databases.

We often find ourselves questioning whether this particular query will fall in line with our expectations. And until not so long ago our tool of choice was a SQL Profiler, that ships with SQL Server. It’s plenty powerful but has one flaw – it pretty much requires the SQL Server installation. This might be a deal breaker for some clients using other DBs, like Postgres or MySQL (which are all supported by the way).

EF to the resque

Instead of firing off the profiler and fishing out the batches, we could have Entity Framework itself pass us the result. After all, it needs to build SQL before sending it off the the database, so all we have to do it to ask nicely. Stack Overflow is quite helpful here:

public static class IQueryableExtensions // this is the EF Core 3.1 version.
    {
        public static string ToSql<TEntity>(this IQueryable<TEntity> query) where TEntity : class
        {
            var enumerator = query.Provider.Execute<IEnumerable<TEntity>>(query.Expression).GetEnumerator();
            var relationalCommandCache = enumerator.Private("_relationalCommandCache");
            var selectExpression = relationalCommandCache.Private<SelectExpression>("_selectExpression");
            var factory = relationalCommandCache.Private<IQuerySqlGeneratorFactory>("_querySqlGeneratorFactory");

            var sqlGenerator = factory.Create();
            var command = sqlGenerator.GetCommand(selectExpression);

            string sql = command.CommandText;
            return sql;
        }

        private static object Private(this object obj, string privateField) => obj?.GetType().GetField(privateField, BindingFlags.Instance | BindingFlags.NonPublic)?.GetValue(obj);
        private static T Private<T>(this object obj, string privateField) => (T)obj?.GetType().GetField(privateField, BindingFlags.Instance | BindingFlags.NonPublic)?.GetValue(obj);
    }

The usage is simple

Suppose we’ve got the following inputs: One simple table, that we’d like to group by one field and total by another. Database Context is also pretty much boilerplate. One thing to note here is a couple of database providers we are going to try the query against.

public class SomeTable
{
    public int Id { get; set; }
    public int Foobar { get; set; }
    public int Quantity { get; set; }
}

class MyDbContext : DbContext
{
    public DbSet<SomeTable> SomeTables { get; set; }
    public static readonly LoggerFactory DbCommandConsoleLoggerFactory
        = new LoggerFactory(new[] {
        new ConsoleLoggerProvider ((category, level) =>
            category == DbLoggerCategory.Database.Command.Name &&
            level == LogLevel.Trace, true)
        });
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        //run with SQL Server provider to get T-SQL
        optionsBuilder.UseNpgsql("Server=localhost;Port=5432;Database=test;User Id=;Password=;")
        //alternatively use other supported provider
        //optionsBuilder.UseSqlServer("Server=.\\SQLEXPRESS;Database=test;Trusted_Connection=true")
        ;
        base.OnConfiguring(optionsBuilder);
    }
}

The test bench would look something like so

class Program
{
    static void Main(string[] args)
    {

        var context = new MyDbContext();
        var someTableData = context.SomeTables
                .GroupBy(x => x.Foobar)
                .Select(x => new { Foobar = x.Key, Quantity = x.Sum(y => y.Quantity) })
                .OrderByDescending(x => x.Quantity)
                .Take(10) // we've built our query as per normal
                .ToSql(); // this is the magic
        Console.Write(someTableData);
        Console.ReadKey();
    }
}

And depending on our choice of provider the output would show ef core generated sql for SQL Server and Postgres

        -- MSSQL
        SELECT TOP(@__p_0) [s].[Foobar], SUM([s].[Quantity]) AS [Quantity]
        FROM [SomeTables] AS [s]
        GROUP BY [s].[Foobar]
        ORDER BY SUM([s].[Quantity]) DESC

        -- PG SQL
         SELECT s."Foobar", SUM(s."Quantity")::INT AS "Quantity"
        FROM "SomeTables" AS s
        GROUP BY s."Foobar"
        ORDER BY SUM(s."Quantity")::INT DESC
        LIMIT @__p_0

Running https with Docker

It’s not a secret we love Docker. And with recent changes to how Chrome treats SameSite cookies it’s become a bit of a pain to develop any sort of oAuth solutions with containers: these have to go over SSL so the browser takes it.

Tools like dotnet dev-certs do provide some relief by generating self-signed certs and adding those to trusted store on host machine. In short – most of the time, host-to-container development is not an issue.

What if we need more than one domain?

Sometimes there will be cases where we’d like to access the same service by two domain names. It might be useful if Host header is required:

we can opt for what’s known a SAN certificate. It’s an extension to x.509 that allows us to reuse the same cert for multiple domain names. We can then trust certificate on our dev machine and make Docker use the same cert for HTTPS:

#create a SAN cert for both server.docker.local and localhost
$cert = New-SelfSignedCertificate -DnsName "server.docker.local", "localhost" -CertStoreLocation cert:\localmachine\my

#export it for docker container to pick up later
$password = ConvertTo-SecureString -String "123123" -Force -AsPlainText
Export-PfxCertificate -Cert $cert -FilePath C:\https\aspnetapp.pfx -Password $password

# trust it on our host machine
$store = New-Object System.Security.Cryptography.X509Certificates.X509Store "TrustedPublisher","LocalMachine"
$store.Open("ReadWrite")
$store.Add($cert)
$store.Close()

More containers?

Sometimes we want one container to talk to another while retaining the ability to check up on things from localhost. Consider the following docker-compose:

version: '3'
services:
  client: # client process that needs to talk to server
    depends_on:
      - server
  server: # server that we'd also like to access from the outside
    image:     
    ports:
      - "8443:443"

This would roughtly translate to the following network layout:

Problems start

When one container needs to talk to another container it’s a slightly different story: dev tools don’t have control over containers and cannot magically trust certificates inside there. We can try opt for properly signed certificates (from Let’s Encrypt for example), but that’s a whole different story and is likely not worth it for development machines.

The above powershell script is also going to fall short as it’s only adding the cert onto developmant machine – containers will keep failing to validate the cert. The solution would build on the same principles: generate a self-signed cert and trust it everywhere. Since most Docker containers run Linux we might have better luck going the opposite direction and generating certs in PEM format using a well known tool OpenSSL. Then we’d use Dockerfiles to inject this cert into all our containers and finally we’d convert it into format suitable for our host Windows machine (PKCS#12).

$certPass = "password_here"
$certSubj = "host.docker.internal"
$certAltNames = "DNS:localhost,DNS:host.docker.internal,DNS:identity_server" # we can also add individual IP addresses here like so: IP:127.0.0.1
$opensslPath="path\to\openssl\binaries" #aOpenSSL needs to be present on the host, no installation is necessary though
$workDir="path\to\your\project"
$dockerDir=Join-Path $workDir "ProjectApi"

#generate a self-signed cert with multiple domains
Start-Process -NoNewWindow -Wait -FilePath (Join-Path $opensslPath "openssl.exe") -ArgumentList "req -x509 -nodes -days 365 -newkey rsa:2048 -keyout ",
                                          (Join-Path $workDir aspnetapp.key),
                                          "-out", (Join-Path $dockerDir aspnetapp.crt),
                                          "-subj `"/CN=$certSubj`" -addext `"subjectAltName=$certAltNames`""

# this time round we convert PEM format into PKCS#12 (aka PFX) so .net core app picks it up
Start-Process -NoNewWindow -Wait -FilePath (Join-Path $opensslPath "openssl.exe") -ArgumentList "pkcs12 -export -in ", 
                                           (Join-Path $dockerDir aspnetapp.crt),
                                           "-inkey ", (Join-Path $workDir aspnetapp.key),
                                           "-out ", (Join-Path $workDir aspnetapp.pfx),
                                           "-passout pass:$certPass"

$password = ConvertTo-SecureString -String $certPass -Force -AsPlainText
$cert = Get-PfxCertificate -FilePath (Join-Path $workDir "aspnetapp.pfx") -Password $password

# and still, trust it on our host machine
$store = New-Object System.Security.Cryptography.X509Certificates.X509Store [System.Security.Cryptography.X509Certificates.StoreName]::Root,"LocalMachine"
$store.Open("ReadWrite")
$store.Add($cert)
$store.Close()

Example: Running Identity Server

Now we have our certs (for example, located in %USERPROFILE%.aspnet\https). Here’s a quick howto tell asp.net core -base containers to pick them up:

docker pull your_docker_image
docker run --rm -it -p 8000:80 -p 8001:443 -e ASPNETCORE_URLS="https://+;http://+" -e ASPNETCORE_HTTPS_PORT=8001 -e ASPNETCORE_Kestrel__Certificates__Default__Password="123123" -e ASPNETCORE_Kestrel__Certificates__Default__Path=\https\aspnetapp.pfx -v %USERPROFILE%\.aspnet\https:C:\https\ your_docker_image

docker run <your image> --rm -it -p 8000:80 -p 8001:443 -e ASPNETCORE_URLS="https://+;http://+" -e ASPNETCORE_HTTPS_PORT=8001 -e ASPNETCORE_Kestrel__Certificates__Default__Password="123123" -e ASPNETCORE_Kestrel__Certificates__Default__Path=/https/aspnetapp.pfx

Or in docker-compose format:

version: '3'
services:
  identity_server:
    image: mcr.microsoft.com/dotnet/core/samples:aspnetapp    
    environment: 
      - ASPNETCORE_URLS=https://+:443;http://+:80
      - ASPNETCORE_Kestrel__Certificates__Default__Password=password_here
      - ASPNETCORE_Kestrel__Certificates__Default__Path=/https/aspnetapp.pfx
    volumes:
      - ~/.aspnet/https/:/https/:ro 
    container_name: identity_server
    ports:
      - "8443:443"
      - "8080:80"

Data visualisation with Vega

We love nice dashboards. And if you see a chart somewhere on a webpage – chances are it runs D3.js. D3.js is a JavaScript library for manipulating documents based on data. It allows you to do a great deal of visualisation but comes with a bit of a learning curve. Even though the data can come in any shape and form, plotting and transformations are JavaScript.

Declarative approach

This is where Vega comes forward. Everything is now a JSON therefore we can literally build and ship visualisations without touching JavaScript at all!

Step by step

Suppose we’ve got hierarchical animal data represented by following JSON:

"values": [
        {"id": "1", "parent": null, "title": "Animal"},
        {"id": "2", "parent": "1", "title": "Duck"},
        {"id": "3", "parent": "1", "title": "Fish"},
        {"id": "4", "parent": "1", "title": "Zebra"}
      ]

What we can then do is to lay the nodes out in a tree-like shape (stratify does the job):

"transform": [
        {
          "type": "stratify",
          "key": "id",
          "parentKey": "parent"
        },
        {
          "type": "tree",
          "method": "tidy",
          "separation": true,
          "size": [{"signal": "width"}, {"signal": "height"}]
        }
      ]

having laid out the nodes, we need to generate connecting lines, treelinks + linkpath combo does exactly that:

{
      "name": "links",
      "source": "tree", // take datasource "tree" as input
      "transform": [
        { "type": "treelinks" }, // apply transform 1
        { "type": "linkpath", // follow up with next transform
          "shape": "diagonal"
          }
      ]
    }

now that we’ve got our data sources, we want to draw actual objects. In Vega these are called marks. For simplicity I’m only drawing one rectangle with a title for each data point and some basic lines to connect:

"marks": [
    {
      "type": "path",
      "from": {"data": "links"}, // dataset we defined above
      "encode": {
        "enter": {
          "path": {"field": "path"} // linkpath generated a dataset with "path" field in it - we just grab it here
        }
      }
    },
    {
      "type": "rect",
      "from": {"data": "tree"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    },
    {
      "type": "text",
      "from": {"data": "tree"}, // use data set we defined earlier
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "text": {"field": "title"}, // we can use data fields to display actual values
          "x": {"field": "x"}, // use data fields to draw values from
          "y": {"field": "y"},
          "dx": {"value":50}, // offset the mark to appear in rectangle center
          "dy": {"value":13},
          "align": {"value": "center"}
        }
      }
    }
  ]

All in all we arrived at a very basic hierarchical chart. It looks kinda plain and can definitely be improved: the rectangles there should probably be replaced with groups and connection paths will need some work too.

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "width": 800,
  "height": 300,
  "padding": 5,

  "data": [
    {
      "name": "tree",
      "values": [
        {"id": "1", "parent": null, "title": "Animal"},
        {"id": "2", "parent": "1", "title": "Duck"},
        {"id": "3", "parent": "1", "title": "Fish"},
        {"id": "4", "parent": "1", "title": "Zebra"}
      ],
      "transform": [
        {
          "type": "stratify",
          "key": "id",
          "parentKey": "parent"
        },
        {
          "type": "tree",
          "method": "tidy",
          "separation": true,
          "size": [{"signal": "width"}, {"signal": "height"}]
        }
      ]      
    },
    {
      "name": "links",
      "source": "tree",
      "transform": [
        { "type": "treelinks" },
        { "type": "linkpath",
          "shape": "diagonal"
          }
      ]
    }, 
    {
      "name": "tree-boxes",
      "source": "tree",
      "transform": [
          { 
            "type": "filter",
            "expr": "datum.parent == null"
          }
        ]
    },
    {
      "name": "tree-circles",
      "source": "tree",
      "transform": [
        {
          "type": "filter",
          "expr": "datum.parent != null"
        }
      ]
    }
  ],
  "marks": [
    {
      "type": "path",
      "from": {"data": "links"},
      "encode": {
        "enter": {
          "path": {"field": "path"}
        }
      }
    },
    {
      "type": "rect",
      "from": {"data": "tree-boxes"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    },
    {
      "type": "symbol",
      "from": {"data": "tree-circles"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    },
    {
      "type": "rect",
      "from": {"data": "tree"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    },
    {
      "type": "text",
      "from": {"data": "tree"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "text": {"field": "title"},
          "x": {"field": "x"},
          "y": {"field": "y"},
          "dx": {"value":50},
          "dy": {"value":13},
          "align": {"value": "center"}
        }
      }
    }
  ]
}

Getting a bit fancier

Suppose, we would like to render different shapes for root and leaf nodes of our chart. One way to achieve this will be to add two filter transformations based on your tree dataset and filter them accordingly:

    {
      "name": "tree-boxes",
      "source": "tree", // grab the existing data
      "transform": [
          { 
            "type": "filter",
            "expr": "datum.parent == null" // run it through a filter defined by expression
          }
        ]
    },
    {
      "name": "tree-circles",
      "source": "tree",
      "transform": [
        {
          "type": "filter",
          "expr": "datum.parent != null"
        }
      ]
    }

then instead of rendering all marks as rect we’d want two different shapes for respective transformed datasets:

{
      "type": "rect",
      "from": {"data": "tree-boxes"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    },
    {
      "type": "symbol",
      "from": {"data": "tree-circles"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    }

Demo time

Play with Vega in live editor here.

T-SQL syntax analysis

It is nice to be able to analyze the code for either code review automation or rule enforcement. While Roslyn is an extremely useful platform for C# code analysis, more often than not, .net applications come backed by SQL Server. And this means one thing…

We need Roslyn for SQL!

We probably are not getting full blown SQL dependency tracking system just yet. But apparently Microsoft does offer a SqlParser library that is very capable of handling most of static code analysis on SQL. Let us run through a reasonably common scenario. Developers rely on feature flags for all new functionality. Product Owner explicitly enables features are they become ready. States are kept in the database and managed via simple scripts. Since we release code multiple times a day, unfinished features can ship to production as long as respective flags are safely turned off. Now our task is to ensure that change scripts only create feature flag definitions but never turn them on by default.

A bit of theory

Parsing language (usually) happens in stages:

  1. Lexical analysis (convert character stream to basic building blocks – tokens)
  2. Syntactic analysis (group tokens into constructs according to selected grammar – parse tree)
  3. Semantic analysis (convert parse tree into abstract syntax tree)

Once we’ve got the Abstract Syntax Tree (AST), we can interact with it and perform certain operations depending on node type and context. Probably the easiest way to traverse such tree would be to implement a Visitor. Both Roslyn and SQL Parser offer stubs to aid implementing this pattern.

Sample scripts

This particular example revolves around two workflows:

-- case 1 - definе new feature flag
INSERT INTO dbo.FeatureFlag ( [Key], Value ) VALUES ( @featureFlagId, N'true')
-- case 2 - update existing flag
UPDATE dbo.FeatureFlag SET Value = 'true' WHERE [Key] = @featureFlagId -- enable/disable flag state

Let’s imagine we’ve got a directory where developers put new scripts. We need to scan it and figure out whether any scripts attempt to update the Value in our table:

using Microsoft.SqlServer.Management.SqlParser.Parser;
......	
	foreach (var file in Directory.EnumerateFiles(@"C:\ChangeScriptsFolder"))
	{
		var result = Parser.Parse(File.ReadAllText(file)); // parse SQL. That's all code we need to get an Abstract Syntax Tree
		var visitor = new FeatureFlagStateVisitor(file); // prep our visitor
		result.Script.Accept(visitor); // run
		if (!visitor.Result.IsSuccess)
		{
			string errorList = visitor.Result.Errors.Aggregate(new StringBuilder(), (sb, error) => sb.AppendLine(error)).ToString();
			throw new Exception($"FeatureFlag state must be disabled on build. Sql text analysis indicates following issues: {errorList}");
		}
	}
.......

And the visitor itself can look something like this:

    public class SqlAnalysisResult
    {
        public bool IsSuccess { get; set; }
        public IList<string> Errors { get; set; }
    }
    public class FeatureFlagStateVisitorContext
    {
        public bool IsFtUpdating { get; set; }
        public bool IsFtInserting { get; set; }
        public string StatementLocation { get; set; }
        public int InsertColumnRef { get; set; }
    }
    public class FeatureFlagStateVisitor: SqlCodeObjectRecursiveVisitor
    {
        public SqlAnalysisResult Result { get; set; }
        private readonly FeatureFlagStateVisitorContext _context;
        private readonly string _fileName;

        public FeatureFlagStateVisitor(string fileName)
        {
            Result = new SqlAnalysisResult
            {
                Errors = new List<string>(),
                IsSuccess = true
            };
            _context = new FeatureFlagStateVisitorContext();
            _fileName = fileName;
        }

        public override void Visit(SqlInsertSpecification codeObject)
        {
            if (codeObject.Target.Sql.CaseInsensitiveContains("FeatureFlag"))
            {
                _context.InsertColumnRef = codeObject.TargetColumns.IndexOf(codeObject.TargetColumns.FirstOrDefault(c => c.ColumnName.Sql.Contains("Value")));
                if (_context.InsertColumnRef >= 0)
                {
                    _context.IsFtInserting = true;
                    _context.StatementLocation = $"L{codeObject.StartLocation.LineNumber}:{codeObject.StartLocation.Offset} - L{codeObject.EndLocation.LineNumber}:{codeObject.EndLocation.Offset}";
                }
            }
            base.Visit(codeObject);
            _context.IsFtInserting = false;
        }

        public override void Visit(SqlRowConstructorExpression codeObject)
        {
            if (_context.IsFtInserting && codeObject.Values[_context.InsertColumnRef].Sql.CaseInsensitiveContains("true"))
            {
                Result.IsSuccess = false;
                Result.Errors.Add($"INSERT {_fileName} - {_context.StatementLocation}");
            }
            base.Visit(codeObject);
        }

        public override void Visit(SqlUpdateSpecification codeObject)
        {
            if (codeObject.Target.Sql.CaseInsensitiveContains("FeatureFlag"))
            {
                _context.IsFtUpdating = true;
                _context.StatementLocation = $"L{codeObject.StartLocation.LineNumber}:{codeObject.StartLocation.Offset} - L{codeObject.EndLocation.LineNumber}:{codeObject.EndLocation.Offset}";
            }
            base.Visit(codeObject);
            _context.IsFtUpdating = false;
        }

        public override void Visit(SqlColumnAssignment codeObject)
        {
            if (_context.IsFtUpdating && codeObject.Column.Sql.CaseInsensitiveContains("Value") && codeObject.Value.Sql.CaseInsensitiveContains("true"))
            {
                Result.IsSuccess = false;
                Result.Errors.Add($"UPDATE {_fileName} - {_context.StatementLocation}");
            }
            base.Visit(codeObject);
        }
    }

The idea is pretty simple – we keep track where we are at the moment and refer back to this context when making final decision on SqlColumnAssignment/SqlRowConstructorExpression level.

Walking code with Roslyn

It was going to happen sooner or later – our research on C# dynamic features eventually ended up with an attepmt to parse bits of source code. There are quite a few solutions on the market, with NRefactory being our preferred tool over the years. There are however a few limitations: it does not support .net core and C# 6.

It is a big deal

It might seem, that support for newer language spec is not critical. But in fact, it gets problematic very quickly even in more established projects. Luckily for us, Microsoft has chosen to open source Roslyn – the very engine that powers their compiler services. Their official documentation covers the platform pretty well, and goes in great detail of writing Visual Studio code analysers. We however often have to deal with writing MSBuild tasks that load the whole solution and run analysis on class hierarchies (for example, to detect whether a single select statement is being called inside a foreach loop – we would fail the build and suggest to replace it with bulk select)

Installing

Roslyn is available via NuGet as a number of Microsoft.CodeAnalysis.* packages. We normally include these four:

Install-Package Microsoft.CodeAnalysis.Workspaces.MSBuild
Install-Package Microsoft.CodeAnalysis
Install-Package Microsoft.CodeAnalysis.CSharp
Install-Package Microsoft.Build.Locator # this is a helper to locate correct MSBuild toolchain (in case the machine has more than one installed)

Sometimes the environment gets confused as to what version MSBuild to use, and this is why starting a project with something like this is pretty much a must since VS2015:

if (!MSBuildLocator.IsRegistered) MSBuildLocator.RegisterDefaults(); // ensures correct version is loaded up
var _ = typeof(Microsoft.CodeAnalysis.CSharp.Formatting.CSharpFormattingOptions); // this ensures library is referenced so the compiler would not try to optimise it away (if dynamically loading assemblies or doing other voodoo that can throw the compiler off) - probably less important than the above but we prefer to follow cargo cult here and leave it be for 

After initial steps, simplistic solution traversal would look something along these lines:

async Task AnalyseSolution()
{
	using (var w = MSBuildWorkspace.Create())
	{
		var solution = await w.OpenSolutionAsync(@"MySolution.sln");		
		foreach (var project in solution.Projects)
		{			
			var docs = project.Documents; // allows for file-level document filtering
			var compilation = await project.GetCompilationAsync(); // allows for assembly-level analysis as well as SemanticModel 
			foreach (var doc in docs)
			{
				var walker = new CSharpSyntaxWalker(); // CSharpSyntaxWalker is an abstract class - we will need to define our own implementation for this to actually work
				walker.Visit(await doc.GetSyntaxRootAsync()); // traverse the syntax tree
			}
		}
	}
}

Syntax Tree Visitor

As with pretty much every single mainstream syntax analyser, the easiest way to traverse syntax trees is by using a Visitor Pattern. It allows to decouple tree nodes and processing logic. Which will allow room for expansion on either sides (easy to add new logic, easy to add new tree node types). Roslyn has stub CSharpSyntaxWalker that allows us to only override required nodes for processing. Everything else is then taken care of.

With basics out of the way, let’s look into classes that make up our platform here. Top of the hierarchy is MSBuild Workspace followed by Solution, Project and Document. Roslyn makes a distinction between parsing code and compiling it. Meaning some analytics will only be available in Compilation class that is available for project as well as for individual documents down the track.

Traversing the tree

Just loading the solution is kinda pointless though. We’d need to come up with processing logic – and the best place to do it would be a CSharpSyntaxWalker subclass. Suppose, we’d like to determine whether class constructor contains if statements that are driven by parameters. This might mean we’ve got overly complicated classes and could benefit from refactoring these out:

public class ConstructorSyntaxWalker : CSharpSyntaxWalker
{
	public List<ISymbol> Parameters { get; set; }
	public int IfConditions { get; set; }

	SemanticModel sm;

	public ConstructorSyntaxWalker(SemanticModel sm)
	{
		this.sm = sm;
		Parameters = new List<ISymbol>();
	}

	public override void VisitIfStatement(IfStatementSyntax node)
	{
		Parameters.AddRange(sm.AnalyzeDataFlow(node).DataFlowsIn); // .AnalyzeDataFlow() is one of the most commonly used parts of the platform: it requires a compilation to work off and allows tracking dependencies. We could then check if these parameters are supplied to constructor and make a call whether this is allowed 
		IfConditions++; // just count for now, nothing fancy
		base.VisitIfStatement(node);
	}
}

Monitoring Mikrotik with InfluxDb

With unlimited fibre plans it is less important to keep track on who is the biggest content consumer in the office, but it still is interesting to find out.

Mikrotik

We’ve got a Mikrotik box serving as our gateway. It might be a total overkill, but it is rock solid and offers a ton of feaures with easy to follow configuration UI (they call it WinBox). One feature we’ll be particularly interested in is ip accounting. Despite intimidating wiki page, the set up is actually pretty simple:

  1. tick a couple checkboxes
  2. enable web access
  3. and optionally supply ip of host that will be allowed to collect data from the router:

With data source out of the way it’s time to look the receiving side.

InfluxDB

The endgame goal was to feed the data into nice Grafana dashboards (we touched on it here). Luckily we found this dashboard that promised to be pretty much a turnkey solution. Unfortunately the exporter didn’t want to play nicely. The issue was with the exporter assuming that local network would be 192.168.0.0/16 (as well as hardcoded influxdb address). This appears to be correct, but we still couldn’t get it to parse our network of 192.168.1.0/24 (lack of Go knowledge might have been a contributing factor here too). What can we do then?

Build one!

Ladies and gentlemen, I give you mikrotik-acct! This collector is written with .net core 3.1 so should run on Docker with no real issues. It offers more knobs to play with through config file (which is hopefully all self-explanatory). One thing of interest here is actually a hepler method that tells us whether address is in our subnet or not: it appears, .net core doesn’t offer this functrionality out of the box, so we had to resort to the mighty Stack Overflow:

public static bool IsInSubnet(this IPAddress address, string subnetMask)
        {
            var slashIdx = subnetMask.IndexOf("/");
            if (slashIdx == -1)
            {
                // We only handle netmasks in format "IP/PrefixLength".
                throw new NotSupportedException("Only SubNetMasks with a given prefix length are supported.");
            }

            // First parse the address of the netmask before the prefix length.
            var maskAddress = IPAddress.Parse(subnetMask.Substring(0, slashIdx));

            if (maskAddress.AddressFamily != address.AddressFamily)
            {
                // We got something like an IPV4-Address for an IPv6-Mask. This is not valid.
                return false;
            }

            // Now find out how long the prefix is.
            int maskLength = int.Parse(subnetMask.Substring(slashIdx + 1));

            if (maskAddress.AddressFamily == AddressFamily.InterNetwork)
            {
                // Convert the mask address to an unsigned integer.
                var maskAddressBits = BitConverter.ToUInt32(maskAddress.GetAddressBytes().Reverse().ToArray(), 0);

                // And convert the IpAddress to an unsigned integer.
                var ipAddressBits = BitConverter.ToUInt32(address.GetAddressBytes().Reverse().ToArray(), 0);

                // Get the mask/network address as unsigned integer.
                uint mask = uint.MaxValue << (32 - maskLength);

                // https://stackoverflow.com/a/1499284/3085985
                // Bitwise AND mask and MaskAddress, this should be the same as mask and IpAddress
                // as the end of the mask is 0000 which leads to both addresses to end with 0000
                // and to start with the prefix.
                return (maskAddressBits & mask) == (ipAddressBits & mask);
            }

            if (maskAddress.AddressFamily == AddressFamily.InterNetworkV6)
            {
                // Convert the mask address to a BitArray.
                var maskAddressBits = new BitArray(maskAddress.GetAddressBytes());

                // And convert the IpAddress to a BitArray.
                var ipAddressBits = new BitArray(address.GetAddressBytes());

                if (maskAddressBits.Length != ipAddressBits.Length)
                {
                    throw new ArgumentException("Length of IP Address and Subnet Mask do not match.");
                }

                // Compare the prefix bits.
                for (int i = 0; i < maskLength; i++)
                {
                    if (ipAddressBits[i] != maskAddressBits[i])
                    {
                        return false;
                    }
                }

                return true;
            }

            throw new NotSupportedException("Only InterNetworkV6 or InterNetwork address families are supported.");
        }

Grafana

After having gone through all the trouble we’ve finally got ourselves nice dashboards:

SQL Server computed columns: not just mathematics

Computed columns in SQL server are very handy when we’ve got a formula that we can rely on and some CPU cycles to spare. There’s heaps of articles online about what these are, so we’d not repeat it here. Let us rather look at one peculiar case…

What if I’ve got two bit columns?

A very plausible scenario indeed. Let’s create a table like so:

CREATE TABLE Test (
Id INT PRIMARY KEY,
Check1 BIT NOT NULL,
Check2 BIT NOT NULL,
CheckAll AS (Check1 = 1 OR Check2 = 1) -- Incorrect syntax near '='.
)

that didn’t seem to work and the error message is a bit unhelpful: we know it’s wrong but why?

Reading the documentation once again

computed_column_expression
Is an expression that  defines the value  of a computed column. A computed column is a virtual column that is not physically stored in the table but is computed from an expression that uses other columns in the same table. For example, a computed column could have the definition: cost AS price * qty. The  expression can be a noncomputed column name, constant, function, variable , and any combination of these connected by one or more operators. The expression cannot be a subquery or include an alias data type.

Okay, this kinda hints at the fact we need to return a value. So a simple search condition does not cut it. Let us try one more time:

CREATE TABLE Test (
Id INT PRIMARY KEY,
Check1 BIT NOT NULL,
Check2 BIT NOT NULL,
CheckAll AS (CASE WHEN Check1 = 1 OR Check2 = 1 THEN 1 ELSE 0 END) -- this time is better! we can roll with it
)

Take it up another notch

The above example works but looks a bit too verbose and explicit. Is there a better way? If we deal with bit columns, we might be in luck and opt for t-sql bitwise operators:

CREATE TABLE Test (
Id INT PRIMARY KEY,
Check1 BIT NOT NULL,
Check2 BIT NOT NULL,
CheckAll AS (Check1|Check2) 
)

Splitting strings with MySQL

Suppose we’ve got a string that we’d like to parse it into a table and run DISTINCT over.

A613; A613; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A613;

MySQL would not automatically convert a string into a table and there’s no library function to do the job for us. But we can leverage a classic SUBSTRING_INDEX(SUBSTRING_INDEX(newString, ‘;’, N), ‘;’, -1) expression to extract Nth item from the string: 

NSUBSTRING_INDEX(SUBSTRING_INDEX(newString, ‘;’, N), ‘;’, -1)
1A613
2A613
3A095

Generating numeric sequence

For the SUBSTRING_INDEX trick to work we need to generate ourselves a long enough sequence of integers so we can cross join it onto our source string. Check out the folowing sequence generator:

select (th*1000+h*100+t*10+u+1) x from
(select 0 th union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) A,
(select 0 h union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) B,
(select 0 t union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) C,
(select 0 u union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) D
order by x;

yes, this particular example is limited to 10k items but it can be expanded upon. It will come down to knowing your data and ensuring you’ve got enough of these to feed to the splitter. Or use alternatives.

Stopping enumeration

After we’ve got the sequence going we need to find a number of meaningful elements in the source string. We basically need to count the delimiters in our source string. Problem is, MySQL doesn’t have a built in function for that. Mathematics to the rescue though:

SELECT  (
            CHAR_LENGTH(val)
            - CHAR_LENGTH( REPLACE ( val, ";", "") ) 
        ) / CHAR_LENGTH(";")        
   AS count    
FROM "source string"

Putting it together

SELECT distinct SUBSTRING_INDEX(SUBSTRING_INDEX(val, '; ', x), '; ', -1) FROM
(
  select (t*10+u+1) x, val from
  (select 0 t union select 1 union select 2 union select 3 union select 4 union
  select 5 union select 6 union select 7 union select 8 union select 9) A,
  (select 0 u union select 1 union select 2 union select 3 union select 4 union
  select 5 union select 6 union select 7 union select 8 union select 9) B
	CROSS JOIN (SELECT 'A613; A613; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A613' as val
               ) as seq
	ORDER BY x
) as raw_data
WHERE x <= (
            CHAR_LENGTH(val)
            - CHAR_LENGTH( REPLACE ( val, ";", "") ) 
        ) / CHAR_LENGTH(";")