EF Core 6 – Handling JSON data in SQL Server

We have seen a few projects where the client insists on using SQL Server, but their workload generates a bunch of semi-structured JSON data for us to store. Now we need to bridge the gap and efficiently query it. One realistic case study where this situation may pop up would be building a browser extension that helps capture and organise wish lists from various platforms such as Amazon or eBay.

Let’s build a wish list

Our journey would likely start with a web browser. We may have a browser extension of some sort that would send information about an item we’re currently watching to an API. That API will have to make sense of all different attributes each website provides and try store it in a way that makes querying as easy and efficient as possible.

The catch is, we don’t know full list of attributes. And, probably, we should not care – perfect data model for frequently changing e-commerce websites would be hard to deduce and likely very painful to maintain. We could just lump it all together and chuck it into a schema-less data store like Cosmos DB or Mongo, but for the sake of argument we’ll imagine the client was adamant they needed a SQL Server because it’s a business decision they made a long time ago.

Not a huge drama

SQL Server has functions to work with JSON data – all we have to do is put it into a sufficiently big column. We can save heaps of time creating one simple data model and calling it a day.

Moving on to coding, let us focus on data retrieval and skip the API part for now – it’s a whole different can or worms and will distract us from our point. From here on we’ll be exploring ways to query the data

Test Bench

We’ve created a small EF Core project where DBContext looks something like this:

class SHDbContext : DbContext
{
	public DbSet<WishlistEntry> WishlistEntries { get; set; }

	protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
	{
		base.OnConfiguring(optionsBuilder);
		optionsBuilder.UseSqlServer(Environment.GetEnvironmentVariable("DB_CONNSTR")).LogTo(Console.WriteLine);
	}
}

where WishlistEntry is defined like so:

class WishlistEntry
{

	public int Id { get; set; }
	public int CustomerId { get; set; }
	[MaxLength(300)]
	public string ItemSource { get; set; }
	public string ItemData { get; set; }

	[DatabaseGenerated(DatabaseGeneratedOption.Computed)]
	public string ItemCategory { get; set; }
}

class ItemData
{
	public string Sku { get; set; }
	public string Name { get; set; }
	public List<Category> Category { get; set; }
}

class Category
{
	public string Id { get; set; }
	public string Name { get; set; }
}

with setup out of the way, let us define a sample entry and write a query to pick all entries that contain a category “Connected Home & Housewares”:

{
  "sku": 185267,
  "name": "Duracell - D Batteries (4-Pack)",
  "type": "HardGood",
  "price": 9.99,
  "upc": "041333430010",
  "category": [
    {
      "id": "pcmcat312300050015",
      "name": "Connected Home & Housewares"
    },
    {
      "id": "pcmcat248700050021",
      "name": "Housewares"
    },
    {
      "id": "pcmcat303600050001",
      "name": "Household Batteries"
    },
    {
      "id": "abcat0208002",
      "name": "Alkaline Batteries"
    }
  ],
  "shipping": 5.99,
  "description": "Compatible with select electronic devices; D size; DURALOCK Power Preserve technology; 4-pack",
  "manufacturer": "Duracell",
  "model": "MN1300R4Z",
  "url": "http://www.bestbuy.com/site/duracell-d-batteries-4-pack/185267.p?id=1051384046551&skuId=185267&cmp=RMXCC",
  "image": "http://img.bbystatic.com/BestBuy_US/images/products/1852/185267_sa.jpg"
}

Option one – JsonConvert.Deserialize

The naive approach to the desired outcome would be to simply try and deserialise the field and pray Entity Framework would work its magic:

var context = new SHDbContext();
//var sql = context.WishlistEntries.Select(wle => JsonConvert.DeserializeObject<ItemData>(wle.ItemData)
//                                                .Category.Any(c => c.Name == "Connected Home & Housewares")); //The LINQ expression 'c => c.Name == "Connected Home & Housewares"' could not be translated. 

Unfortunately, this would not even compile as EF has no idea what to do with JsonConvert. We’ll end up having to materialise the query and handle filtering on the client side:

var data = context.WishlistEntries.Select(wle => JsonConvert.DeserializeObject<ItemData>(wle.ItemData))
                                .AsEnumerable()
                                .Where(d => d.Category.Any(c => c.Name == "Connected Home & Housewares"))
                                .ToList();

This will work as expected, and some developers will probably stop here, but this is in fact a time bomb. When our wish lists get big enough, the clients will put a serious strain on SQL Server and the network fetching the whole table in its entirety. Not nice.

Option two – write a Value Converter

Technically this is a variation of the previous option but leveraging EF Core Value Conversion capability for nicer looking query. To enable this, we’ll override OnModelCreating and add a few more lines to DB Context:

class SHDbContext : DbContext
{
	public DbSet<WishlistEntry> WishlistEntries { get; set; }

	protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
	{
		base.OnConfiguring(optionsBuilder);
		optionsBuilder.UseSqlServer(Environment.GetEnvironmentVariable("DB_CONNSTR")).LogTo(Console.WriteLine);
	}

	protected override void OnModelCreating(ModelBuilder modelBuilder)
	{
		base.OnModelCreating(modelBuilder);
		modelBuilder.Entity<WishlistEntry>()
		            .Property(e => e.ItemData)
		            .HasConversion((itemData) => JsonConvert.SerializeObject(itemData), str => JsonConvert.DeserializeObject<ItemData>(str));
	}
}
// ...
// finally, use the new converter like so:
var data = context.WishlistEntries.AsEnumerable().Where(d => d.ItemData.Category.Any(c => c.Name == "Connected Home & Housewares")).ToList();
 // also not ideal as we still have to filter on client side, effectively scanning all tabl

This simply shifts the JSON conversion code up the food chain, so developers won’t see it often. Unfortunately, this “out of sight – out of mind” approach won’t improve our query plan.

Option three – JSON_VALUE function

It gets pretty clear at this stage that we need to be able to filter by json fields on SQL Server side. Luckily, Microsoft provides a few json-handling capabilities. JSON_VALUE function that allows extracting one value out of the blob via given path selector. This is going to eliminate the need to land our query and should hugely improve performance. Entity Framework does not support this function by default, but it should be very easy to add:

// in the context class
 // adding static function to the DB Context
[DbFunction("JSON_VALUE", IsBuiltIn = true, IsNullable = false)]
public static string JsonValue(string expression, string path) => throw new NotImplementedException();

our query then becomes something like this:

var data = context.WishlistEntries.Where(d => SHDbContext.JsonValue(d.ItemData, "$.category[0].name").Contains("Connected Home & Housewares")).ToList();

And this is the correct answer. However, we can further improve performance by adding an index. Indexing the whole json blob will not help, but we can create a computed column with the value we care about and index that instead:

-- add computed column
ALTER TABLE WishlistEntries ADD ItemCategory AS JSON_VALUE(ItemData, '$.category[0].name')
-- index the value
CREATE INDEX IX_ItemCategory ON WishlistEntries(ItemCategory) INCLUDE(CustomerId, ItemSource, ItemData);
-- finally, test if the query picks up index by running the following and checking the Actual Execution Plan:
/* test queries */
SELECT * FROM [WishlistEntries] WHERE JSON_VALUE(ItemData, '$.category[0].name') = 'Connected Home & Housewares'

This enhancement also happens to simplify our code a bit as we can now rely on computed column value when writing queries in EF:

// add computed column to WishlistEntry
class WishlistEntry
{

	public int Id { get; set; }
	public int CustomerId { get; set; }
	[MaxLength(300)]
	public string ItemSource { get; set; }
	public string ItemData { get; set; } 

	[DatabaseGenerated(DatabaseGeneratedOption.Computed)]
	public string ItemCategory { get; set; }
}

// finally the query will be like:
var data = context.WishlistEntries.Where(wle => wle.ItemCategory == "Connected Home & Housewares").ToList();

Conclusion and area for improvement

Keen eyed readers may have noticed that we hardcoded query to first array item. This was a deliberate simplification we opted for to illustrate the point. It should be possible to query an array using OPENJSON … CROSS APPLY like so, but we haven’t explored this avenue far enough to see if we’d ultimately succeed:

var data = context.WishlistEntries.FromSqlRaw(@"SELECT TOP 1 wishlist.*
      FROM [WishlistEntries] as wishlist
CROSS APPLY OPENJSON(ItemData, '$.category') WITH (CatName VARCHAR(200) '$.name') as categories
WHERE categories.CatName = {0}", "Connected Home & Housewares").ToList();

Even though our application is far from complete we are in a good place. Equipped with these techniques, we can handle almost anything users throw at us and are still able to query data fast enough. The code for this exercise is available on GitHub.

Azure Functions – OpenAPI + EF Core = πŸ’₯

Creating Swagger-enabled Azure Functions is not that hard to do. Visual Studio literally comes with a template for that:

Inspecting the newly created project we see that it comes down to one NuGet package. It magically hooks into IWebJobsStartup and registers additional routes for Swagger UI and OpenAPI document. When run, it reflects upon suitable entry points in the assembly and builds required responses on the fly. Elegant indeed.

Installing Entity Framework

Now, suppose, we need to talk to Azure SQL. So, we’d like to add EF Core to the mix. As much as we love to go for latest and greatest, unfortunately it’s a bit messy at the moment. Instead let’s get a bit more conservative and stick to EFCore 3.1.

We did not expect that, did we?

The error message is pretty clear: the assembly somehow did not get copied to the output location. And indeed, the file was missing:

Apparently when VS builds the function, it makes a second copy of the libraries it thinks are required. And in our case, it decided it’s not picking up the dependency. Adding <_FunctionsSkipCleanOutput>true</_FunctionsSkipCleanOutput> to the project file will fix that:

Are there yet?

Probably, but there’s a catch: our deployment package just got bigger. Alternatively, we could downgrade EF Core to 3.1.13 which happens to use the same version of Microsoft.Extensions.Logging.Abstractions. This way we’d avoid having to hack project files at expense or limiting ourselves to an older version of EF Core. Ultimately, we hope OpenAPI extension picks up the slack and goes GA soon. For now, looks like we’ll have to stick to it.

EF Core 6 – does our IMethodCallTranslator still work?

A year and a half ago we posted an article on how we were able to plug into EF Core pipeline and inject our own IMethodCallTranslator. That let us leverage SQL-native encryption functionality with EF Core 3.1 and was ultimately a win. A lot has changed in the ecosystem, we’ve got .NET 5 and and .NET 6 coming up soon. So, we could not help but wonder…

Will it work with EF Core 6?

Apparently, EF6 is mostly an evolutionary step over EF5. That said, we totally missed previous version. So it is unclear to what extent the EF team has reworked their internal APIs. Most of the extensibility points we used were internal and clearly marked as “not for public consumption”. With that in mind, our concerns seemed valid.

Turns out, the code needs change…

The first issue we needed to rectify was implementing ShouldUseSameServiceProvider: from what I can tell, it’s needed to cache services more efficiently, but in our case setting it to default value seems to make sense.

But that’s where things really went sideways:

Apparently adding our custom IDbContextOptionsExtension resets the cache and by the time EF arrives at Model initialisation, instance of DI container gets wiped, leaving us with a bunch of null references (including the one above).

One line fix

I am still unsure why EF so upset when we add new extension. Stepping through the code would likely provide me with the answer but I feel it’s not worth the effort. Playing around with service scopes I however noticed that many built-in services get registered using different extension method with Scoped lifecycle. This prompted me to try change my registration method signature and voila:

And as usual, fully functioning code sits on GitHub.

EF core 3.1: dynamic GroupBy clause

Expanding on many ways we can build dynamic clauses for EF execution, let us look at another example. Suppose we’d like to give our users ability to group items based on a property of their choice.

As usual, our first choice would be to build a LINQ expression as we did with the WHERE case. There’s however a problem: GroupBy needs an object to use as a grouping key. Usually, we’d just make an anonymous type, but it is a compile-time luxury we don’t get with LINQ:

dbSet.GroupBy(s => new {s.Col1, s.Col2}); // not going to fly :/

IL Emit it is then

So, it seems we are left with no other choice but to go through TypeBuilder ordeal (which isn’t too bad really). One thing to point out here – we want to create properties that EF will later use for grouping key. This is where ability to interrogate EF as-built model comes very handy. Another important point – creating a property in fact means creating a private backing field and two special methods for each. We certainly started to appreciate how much the language and runtime do for us:

private void CreateProperty(TypeBuilder typeBuilder, string propertyName, Type propertyType)
{
 // really, just generating "public PropertyType propertyName {get;set;}"
	FieldBuilder fieldBuilder = typeBuilder.DefineField("_" + propertyName, propertyType, FieldAttributes.Private);

	PropertyBuilder propertyBuilder = typeBuilder.DefineProperty(propertyName, PropertyAttributes.HasDefault, propertyType, null);
	MethodBuilder getPropMthdBldr = typeBuilder.DefineMethod("get_" + propertyName, MethodAttributes.Public | MethodAttributes.SpecialName | MethodAttributes.HideBySig, propertyType, Type.EmptyTypes);
	ILGenerator getIl = getPropMthdBldr.GetILGenerator();

	getIl.Emit(OpCodes.Ldarg_0);
	getIl.Emit(OpCodes.Ldfld, fieldBuilder);
	getIl.Emit(OpCodes.Ret);

	MethodBuilder setPropMthdBldr = typeBuilder.DefineMethod("set_" + propertyName,
		  MethodAttributes.Public |
		  MethodAttributes.SpecialName |
		  MethodAttributes.HideBySig,
		  null, new[] { propertyType });

	ILGenerator setIl = setPropMthdBldr.GetILGenerator();
	Label modifyProperty = setIl.DefineLabel();
	Label exitSet = setIl.DefineLabel();

	setIl.MarkLabel(modifyProperty);
	setIl.Emit(OpCodes.Ldarg_0);
	setIl.Emit(OpCodes.Ldarg_1);
	setIl.Emit(OpCodes.Stfld, fieldBuilder);

	setIl.Emit(OpCodes.Nop);
	setIl.MarkLabel(exitSet);
	setIl.Emit(OpCodes.Ret);

	propertyBuilder.SetGetMethod(getPropMthdBldr);
	propertyBuilder.SetSetMethod(setPropMthdBldr);
}

after we’ve sorted this out – it’s pretty much the same approach as with any other dynamic LINQ expression: we need to build something like DbSet.GroupBy(s => new dynamicType {col1 = s.q1, col2 = s.q2}). There’s a slight issue with this lambda however – it returns IGrouping<dynamicType, TElement> – and since outside code has no idea of the dynamic type – there’s no easy way to work with it (unless we want to keep reflecting). I thought it might be easier to build a Select as well and return a Count against each instance of dynamic type. Luckily, we only needed a count, but other aggregations work in similar fashion.

Finally

I arrived at the following code to generate required expressions:

public static IQueryable<Tuple<object, int>> BuildExpression<TElement>(this IQueryable<TElement> source, DbContext context, List<string> columnNames)
{
	var entityParameter = Expression.Parameter(typeof(TElement));
	var sourceParameter = Expression.Parameter(typeof(IQueryable<TElement>));

	var model = context.Model.FindEntityType(typeof(TElement)); // start with our own entity
	var props = model.GetPropertyAccessors(entityParameter); // get all available field names including navigations			

	var objectProps = new List<Tuple<string, Type>>();
	var accessorProps = new List<Tuple<string, Expression>>();
	var groupKeyDictionary = new Dictionary<object, string>();
	foreach (var prop in props.Where(p => columnNames.Contains(p.Item3)))
	{
		var propName = prop.Item3.Replace(".", "_"); // we need some form of cross-reference, this seems to be good enough
		objectProps.Add(new Tuple<string, Type>(propName, (prop.Item2 as MemberExpression).Type));
		accessorProps.Add(new Tuple<string, Expression>(propName, prop.Item2));
	}

	var groupingType = BuildGroupingType(objectProps); // build new type we'll use for grouping. think `new Test() { A=, B=, C= }`

	// finally, we're ready to build our expressions
	var groupbyCall = BuildGroupBy<TElement>(sourceParameter, entityParameter, accessorProps, groupingType); // source.GroupBy(s => new Test(A = s.Field1, B = s.Field2 ... ))
	var selectCall = groupbyCall.BuildSelect<TElement>(groupingType); // .Select(g => new Tuple<object, int> (g.Key, g.Count()))
	
	var lambda = Expression.Lambda<Func<IQueryable<TElement>, IQueryable<Tuple<object, int>>>>(selectCall, sourceParameter);
	return lambda.Compile()(source);
}

private static MethodCallExpression BuildSelect<TElement>(this MethodCallExpression groupbyCall, Type groupingAnonType) 
{	
	var groupingType = typeof(IGrouping<,>).MakeGenericType(groupingAnonType, typeof(TElement));
	var selectMethod = QueryableMethods.Select.MakeGenericMethod(groupingType, typeof(Tuple<object, int>));
	var resultParameter = Expression.Parameter(groupingType);

	var countCall = BuildCount<TElement>(resultParameter);
	var resultSelector = Expression.New(typeof(Tuple<object, int>).GetConstructors().First(), Expression.PropertyOrField(resultParameter, "Key"), countCall);

	return Expression.Call(selectMethod, groupbyCall, Expression.Lambda(resultSelector, resultParameter));
}

private static MethodCallExpression BuildGroupBy<TElement>(ParameterExpression sourceParameter, ParameterExpression entityParameter, List<Tuple<string, Expression>> accessorProps, Type groupingAnonType) 
{
	var groupByMethod = QueryableMethods.GroupByWithKeySelector.MakeGenericMethod(typeof(TElement), groupingAnonType);
	var groupBySelector = Expression.Lambda(Expression.MemberInit(Expression.New(groupingAnonType.GetConstructors().First()),
			accessorProps.Select(op => Expression.Bind(groupingAnonType.GetMember(op.Item1)[0], op.Item2))
		), entityParameter);

	return Expression.Call(groupByMethod, sourceParameter, groupBySelector);
}

private static MethodCallExpression BuildCount<TElement>(ParameterExpression resultParameter)
{
	var asQueryableMethod = QueryableMethods.AsQueryable.MakeGenericMethod(typeof(TElement));
	var countMethod = QueryableMethods.CountWithoutPredicate.MakeGenericMethod(typeof(TElement));

	return Expression.Call(countMethod, Expression.Call(asQueryableMethod, resultParameter));
}

And the full working version is on my GitHub

Entity Framework Core 3.1 – dynamic WHERE clause

Every now and then we get tasked with building a backend for filtering arbitrary queries. Usually clients would like to have a method of sending over an array of fields, values, and comparisons operations in order to retrieve their data. For simplicity we’ll assume that all conditions are joining each other with an AND operator.

public class QueryableFilter {
    public string Name { get; set; }
    public string Value { get; set; }
    public QueryableFilterCompareEnum? Compare { get; set; }
}

With a twist

There’s however one slight complication to this problem – filters must apply to fields on dependent entities (possible multiple levels of nesting as well). This can become a problem not only because we’d have to traverse model hierarchy (we’ll touch on that later), but also because of ambiguity this requirement introduces. Sometimes we’re lucky to only have unique column names across the hierarchy. However more often than not this needs to be resolved one way or another. We can, for example, require filter fields to use dot notation so we know which entity each field relates to. For example, Name -eq "ACME Ltd" AND Name -eq "Cloud Solutions" becomes company.Name -eq "ACME Ltd" AND team.Name -eq "Cloud Solutions"

Building an expression

It is pretty common that clients already have some sort of data querying service with EF Core doing the actual database comms. And since EF relies on LINQ Expressions a lot – we can build required filters dynamically.

public static IQueryable<T> BuildExpression<T>(this IQueryable<T> source, DbContext context, string columnName, string value, QueryableFilterCompareEnum? compare = QueryableFilterCompareEnum.Equal)
{
	var param = Expression.Parameter(typeof(T));

	// Get the field/column from the Entity that matches the supplied columnName value
	// If the field/column does not exists on the Entity, throw an exception; There is nothing more that can be done
	MemberExpression dataField;
	
	var model = context.Model.FindEntityType(typeof(T)); // start with our own entity
	var props = model.GetPropertyAccessors(param); // get all available field names including navigations
	var reference = props.First(p => RelationalPropertyExtensions.GetColumnName(p.Item1) == columnName); // find the filtered column - you might need to handle cases where column does not exist

	dataField = reference.Item2 as MemberExpression; // we happen to already have correct property accessors in our Tuples	

	ConstantExpression constant = !string.IsNullOrWhiteSpace(value)
		? Expression.Constant(value.Trim(), typeof(string))
		: Expression.Constant(value, typeof(string));

	BinaryExpression binary = GetBinaryExpression(dataField, constant, compare);
	Expression<Func<T, bool>> lambda = (Expression<Func<T, bool>>)Expression.Lambda(binary, param);
	return source.Where(lambda);
}

Most of the code above is pretty standard for building property accessor lambdas, but GetPropertyAccessors is the key:

private static IEnumerable<Tuple<IProperty, Expression>> GetPropertyAccessors(this IEntityType model, Expression param)
{
	var result = new List<Tuple<IProperty, Expression>>();

	result.AddRange(model.GetProperties()
								.Where(p => !p.IsShadowProperty()) // this is your chance to ensure property is actually declared on the type before you attempt building Expression
								.Select(p => new Tuple<IProperty, Expression>(p, Expression.Property(param, p.Name)))); // Tuple is a bit clunky but hopefully conveys the idea

	foreach (var nav in model.GetNavigations().Where(p => p is Navigation))
	{
		var parentAccessor = Expression.Property(param, nav.Name); // define a starting point so following properties would hang off there
		result.AddRange(GetPropertyAccessors(nav.ForeignKey.PrincipalEntityType, parentAccessor)); //recursively call ourselves to travel up the navigation hierarchy
	}

	return result;
}

this is where we interrogate EF as-built data model, traverse navigation properties and recursively build a list of all properties we can ever filter on!

Testing it out

Talk is cheap, let’s run a complete example here:

public class Entity
{
	public int Id { get; set; }
}
class Company : Entity
{
	public string CompanyName { get; set; }
}

class Team : Entity
{
	public string TeamName { get; set; }
	public Company Company { get; set; }
}

class Employee : Entity
{
	public string EmployeeName { get; set; }
	public Team Team { get; set; }
}

class DynamicFilters<T> where T : Entity
{
	private readonly DbContext _context;

	public DynamicFilters(DbContext context)
	{
		_context = context;
	}

	public IEnumerable<T> Filter(IEnumerable<QueryableFilter> queryableFilters = null)
	{
		IQueryable<T> mainQuery = _context.Set<T>().AsQueryable().AsNoTracking();
		// Loop through the supplied queryable filters (if any) to construct a dynamic LINQ-to-SQL queryable
		foreach (var filter in queryableFilters ?? new List<QueryableFilter>())
		{
			mainQuery = mainQuery.BuildExpression(_context, filter.Name, filter.Value, filter.Compare);
		}

		mainQuery = mainQuery.OrderBy(x => x.Id);

		return mainQuery.ToList();
	}
}
// --- DbContext
class MyDbContext : DbContext
{
	public DbSet<Company> Companies { get; set; }
	public DbSet<Team> Teams { get; set; }
	public DbSet<Employee> Employees { get; set; }

	protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
	{
		optionsBuilder.UseSqlServer("Server=.\\SQLEXPRESS;Database=test;Trusted_Connection=true");
		base.OnConfiguring(optionsBuilder);
	}
}
// ---
static void Main(string[] args)
{
	var context = new MyDbContext();
	var someTableData = new DynamicFilters<Employee>(context).Filter(new
	List<QueryableFilter> { new QueryableFilter { Name = "CompanyName", Value = "ACME Ltd" }, new QueryableFilter { Name = "TeamName", Value = "Cloud Solutions" } });
}

The above block should produce following SQL:

SELECT [e].[Id], [e].[EmployeeName], [e].[TeamId]
FROM [Employees] AS [e]
LEFT JOIN [Teams] AS [t] ON [e].[TeamId] = [t].[Id]
LEFT JOIN [Companies] AS [c] ON [t].[CompanyId] = [c].[Id]
WHERE [c].[CompanyName] = N'ACME Ltd'
 AND [t].[TeamName] = N'Cloud Solutions'
ORDER BY [e].[Id]

Entity Framework Core 3.1 – Peeking Into Generated SQL

Writing LINQ that produces optimal SQL can be even harder as developers often don’t have visibility into the process. It becomes even more confusing when the application is designed to run against different databases.

We often find ourselves questioning whether this particular query will fall in line with our expectations. And until not so long ago our tool of choice was a SQL Profiler, that ships with SQL Server. It’s plenty powerful but has one flaw – it pretty much requires the SQL Server installation. This might be a deal breaker for some clients using other DBs, like Postgres or MySQL (which are all supported by the way).

EF to the resque

Instead of firing off the profiler and fishing out the batches, we could have Entity Framework itself pass us the result. After all, it needs to build SQL before sending it off the the database, so all we have to do it to ask nicely. Stack Overflow is quite helpful here:

public static class IQueryableExtensions // this is the EF Core 3.1 version.
    {
        public static string ToSql<TEntity>(this IQueryable<TEntity> query) where TEntity : class
        {
            var enumerator = query.Provider.Execute<IEnumerable<TEntity>>(query.Expression).GetEnumerator();
            var relationalCommandCache = enumerator.Private("_relationalCommandCache");
            var selectExpression = relationalCommandCache.Private<SelectExpression>("_selectExpression");
            var factory = relationalCommandCache.Private<IQuerySqlGeneratorFactory>("_querySqlGeneratorFactory");

            var sqlGenerator = factory.Create();
            var command = sqlGenerator.GetCommand(selectExpression);

            string sql = command.CommandText;
            return sql;
        }

        private static object Private(this object obj, string privateField) => obj?.GetType().GetField(privateField, BindingFlags.Instance | BindingFlags.NonPublic)?.GetValue(obj);
        private static T Private<T>(this object obj, string privateField) => (T)obj?.GetType().GetField(privateField, BindingFlags.Instance | BindingFlags.NonPublic)?.GetValue(obj);
    }

The usage is simple

Suppose we’ve got the following inputs: One simple table, that we’d like to group by one field and total by another. Database Context is also pretty much boilerplate. One thing to note here is a couple of database providers we are going to try the query against.

public class SomeTable
{
    public int Id { get; set; }
    public int Foobar { get; set; }
    public int Quantity { get; set; }
}

class MyDbContext : DbContext
{
    public DbSet<SomeTable> SomeTables { get; set; }
    public static readonly LoggerFactory DbCommandConsoleLoggerFactory
        = new LoggerFactory(new[] {
        new ConsoleLoggerProvider ((category, level) =>
            category == DbLoggerCategory.Database.Command.Name &&
            level == LogLevel.Trace, true)
        });
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        //run with SQL Server provider to get T-SQL
        optionsBuilder.UseNpgsql("Server=localhost;Port=5432;Database=test;User Id=;Password=;")
        //alternatively use other supported provider
        //optionsBuilder.UseSqlServer("Server=.\\SQLEXPRESS;Database=test;Trusted_Connection=true")
        ;
        base.OnConfiguring(optionsBuilder);
    }
}

The test bench would look something like so

class Program
{
    static void Main(string[] args)
    {

        var context = new MyDbContext();
        var someTableData = context.SomeTables
                .GroupBy(x => x.Foobar)
                .Select(x => new { Foobar = x.Key, Quantity = x.Sum(y => y.Quantity) })
                .OrderByDescending(x => x.Quantity)
                .Take(10) // we've built our query as per normal
                .ToSql(); // this is the magic
        Console.Write(someTableData);
        Console.ReadKey();
    }
}

And depending on our choice of provider the output would show ef core generated sql for SQL Server and Postgres

        -- MSSQL
        SELECT TOP(@__p_0) [s].[Foobar], SUM([s].[Quantity]) AS [Quantity]
        FROM [SomeTables] AS [s]
        GROUP BY [s].[Foobar]
        ORDER BY SUM([s].[Quantity]) DESC

        -- PG SQL
         SELECT s."Foobar", SUM(s."Quantity")::INT AS "Quantity"
        FROM "SomeTables" AS s
        GROUP BY s."Foobar"
        ORDER BY SUM(s."Quantity")::INT DESC
        LIMIT @__p_0

T-SQL syntax analysis

It is nice to be able to analyze the code for either code review automation or rule enforcement. While Roslyn is an extremely useful platform for C# code analysis, more often than not, .net applications come backed by SQL Server. And this means one thing…

We need Roslyn for SQL!

We probably are not getting full blown SQL dependency tracking system just yet. But apparently Microsoft does offer a SqlParser library that is very capable of handling most of static code analysis on SQL. Let us run through a reasonably common scenario. Developers rely on feature flags for all new functionality. Product Owner explicitly enables features are they become ready. States are kept in the database and managed via simple scripts. Since we release code multiple times a day, unfinished features can ship to production as long as respective flags are safely turned off. Now our task is to ensure that change scripts only create feature flag definitions but never turn them on by default.

A bit of theory

Parsing language (usually) happens in stages:

  1. Lexical analysis (convert character stream to basic building blocks – tokens)
  2. Syntactic analysis (group tokens into constructs according to selected grammar – parse tree)
  3. Semantic analysis (convert parse tree into abstract syntax tree)

Once we’ve got the Abstract Syntax Tree (AST), we can interact with it and perform certain operations depending on node type and context. Probably the easiest way to traverse such tree would be to implement a Visitor. Both Roslyn and SQL Parser offer stubs to aid implementing this pattern.

Sample scripts

This particular example revolves around two workflows:

-- case 1 - definΠ΅ new feature flag
INSERT INTO dbo.FeatureFlag ( [Key], Value ) VALUES ( @featureFlagId, N'true')
-- case 2 - update existing flag
UPDATE dbo.FeatureFlag SET Value = 'true' WHERE [Key] = @featureFlagId -- enable/disable flag state

Let’s imagine we’ve got a directory where developers put new scripts. We need to scan it and figure out whether any scripts attempt to update the Value in our table:

using Microsoft.SqlServer.Management.SqlParser.Parser;
......	
	foreach (var file in Directory.EnumerateFiles(@"C:\ChangeScriptsFolder"))
	{
		var result = Parser.Parse(File.ReadAllText(file)); // parse SQL. That's all code we need to get an Abstract Syntax Tree
		var visitor = new FeatureFlagStateVisitor(file); // prep our visitor
		result.Script.Accept(visitor); // run
		if (!visitor.Result.IsSuccess)
		{
			string errorList = visitor.Result.Errors.Aggregate(new StringBuilder(), (sb, error) => sb.AppendLine(error)).ToString();
			throw new Exception($"FeatureFlag state must be disabled on build. Sql text analysis indicates following issues: {errorList}");
		}
	}
.......

And the visitor itself can look something like this:

    public class SqlAnalysisResult
    {
        public bool IsSuccess { get; set; }
        public IList<string> Errors { get; set; }
    }
    public class FeatureFlagStateVisitorContext
    {
        public bool IsFtUpdating { get; set; }
        public bool IsFtInserting { get; set; }
        public string StatementLocation { get; set; }
        public int InsertColumnRef { get; set; }
    }
    public class FeatureFlagStateVisitor: SqlCodeObjectRecursiveVisitor
    {
        public SqlAnalysisResult Result { get; set; }
        private readonly FeatureFlagStateVisitorContext _context;
        private readonly string _fileName;

        public FeatureFlagStateVisitor(string fileName)
        {
            Result = new SqlAnalysisResult
            {
                Errors = new List<string>(),
                IsSuccess = true
            };
            _context = new FeatureFlagStateVisitorContext();
            _fileName = fileName;
        }

        public override void Visit(SqlInsertSpecification codeObject)
        {
            if (codeObject.Target.Sql.CaseInsensitiveContains("FeatureFlag"))
            {
                _context.InsertColumnRef = codeObject.TargetColumns.IndexOf(codeObject.TargetColumns.FirstOrDefault(c => c.ColumnName.Sql.Contains("Value")));
                if (_context.InsertColumnRef >= 0)
                {
                    _context.IsFtInserting = true;
                    _context.StatementLocation = $"L{codeObject.StartLocation.LineNumber}:{codeObject.StartLocation.Offset} - L{codeObject.EndLocation.LineNumber}:{codeObject.EndLocation.Offset}";
                }
            }
            base.Visit(codeObject);
            _context.IsFtInserting = false;
        }

        public override void Visit(SqlRowConstructorExpression codeObject)
        {
            if (_context.IsFtInserting && codeObject.Values[_context.InsertColumnRef].Sql.CaseInsensitiveContains("true"))
            {
                Result.IsSuccess = false;
                Result.Errors.Add($"INSERT {_fileName} - {_context.StatementLocation}");
            }
            base.Visit(codeObject);
        }

        public override void Visit(SqlUpdateSpecification codeObject)
        {
            if (codeObject.Target.Sql.CaseInsensitiveContains("FeatureFlag"))
            {
                _context.IsFtUpdating = true;
                _context.StatementLocation = $"L{codeObject.StartLocation.LineNumber}:{codeObject.StartLocation.Offset} - L{codeObject.EndLocation.LineNumber}:{codeObject.EndLocation.Offset}";
            }
            base.Visit(codeObject);
            _context.IsFtUpdating = false;
        }

        public override void Visit(SqlColumnAssignment codeObject)
        {
            if (_context.IsFtUpdating && codeObject.Column.Sql.CaseInsensitiveContains("Value") && codeObject.Value.Sql.CaseInsensitiveContains("true"))
            {
                Result.IsSuccess = false;
                Result.Errors.Add($"UPDATE {_fileName} - {_context.StatementLocation}");
            }
            base.Visit(codeObject);
        }
    }

The idea is pretty simple – we keep track where we are at the moment and refer back to this context when making final decision on SqlColumnAssignment/SqlRowConstructorExpression level.

SQL Server computed columns: not just mathematics

Computed columns in SQL server are very handy when we’ve got a formula that we can rely on and some CPU cycles to spare. There’s heaps of articles online about what these are, so we’d not repeat it here. Let us rather look at one peculiar case…

What if I’ve got two bit columns?

A very plausible scenario indeed. Let’s create a table like so:

CREATE TABLE Test (
Id INT PRIMARY KEY,
Check1 BIT NOT NULL,
Check2 BIT NOT NULL,
CheckAll AS (Check1 = 1 OR Check2 = 1) -- Incorrect syntax near '='.
)

that didn’t seem to work and the error message is a bit unhelpful: we know it’s wrong but why?

Reading the documentation once again

computed_column_expression
Is an expression that  defines the value  of a computed column. A computed column is a virtual column that is not physically stored in the table but is computed from an expression that uses other columns in the same table. For example, a computed column could have the definition: cost AS price * qty. The  expression can be a noncomputed column name, constant, function, variable , and any combination of these connected by one or more operators. The expression cannot be a subquery or include an alias data type.

Okay, this kinda hints at the fact we need to return a value. So a simple search condition does not cut it. Let us try one more time:

CREATE TABLE Test (
Id INT PRIMARY KEY,
Check1 BIT NOT NULL,
Check2 BIT NOT NULL,
CheckAll AS (CASE WHEN Check1 = 1 OR Check2 = 1 THEN 1 ELSE 0 END) -- this time is better! we can roll with it
)

Take it up another notch

The above example works but looks a bit too verbose and explicit. Is there a better way? If we deal with bit columns, we might be in luck and opt for t-sql bitwise operators:

CREATE TABLE Test (
Id INT PRIMARY KEY,
Check1 BIT NOT NULL,
Check2 BIT NOT NULL,
CheckAll AS (Check1|Check2) 
)

Splitting strings with MySQL

Suppose we’ve got a string that we’d like to parse it into a table and run DISTINCT over.

A613; A613; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A613;

MySQL would not automatically convert a string into a table and there’s no library function to do the job for us. But we can leverage a classic SUBSTRING_INDEX(SUBSTRING_INDEX(newString, ‘;’, N), ‘;’, -1) expression to extract Nth item from the string: 

NSUBSTRING_INDEX(SUBSTRING_INDEX(newString, ‘;’, N), ‘;’, -1)
1A613
2A613
3A095

Generating numeric sequence

For the SUBSTRING_INDEX trick to work we need to generate ourselves a long enough sequence of integers so we can cross join it onto our source string. Check out the folowing sequence generator:

select (th*1000+h*100+t*10+u+1) x from
(select 0 th union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) A,
(select 0 h union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) B,
(select 0 t union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) C,
(select 0 u union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) D
order by x;

yes, this particular example is limited to 10k items but it can be expanded upon. It will come down to knowing your data and ensuring you’ve got enough of these to feed to the splitter. Or use alternatives.

Stopping enumeration

After we’ve got the sequence going we need to find a number of meaningful elements in the source string. We basically need to count the delimiters in our source string. Problem is, MySQL doesn’t have a built in function for that. Mathematics to the rescue though:

SELECT  (
            CHAR_LENGTH(val)
            - CHAR_LENGTH( REPLACE ( val, ";", "") ) 
        ) / CHAR_LENGTH(";")        
   AS count    
FROM "source string"

Putting it together

SELECT distinct SUBSTRING_INDEX(SUBSTRING_INDEX(val, '; ', x), '; ', -1) FROM
(
  select (t*10+u+1) x, val from
  (select 0 t union select 1 union select 2 union select 3 union select 4 union
  select 5 union select 6 union select 7 union select 8 union select 9) A,
  (select 0 u union select 1 union select 2 union select 3 union select 4 union
  select 5 union select 6 union select 7 union select 8 union select 9) B
	CROSS JOIN (SELECT 'A613; A613; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A613' as val
               ) as seq
	ORDER BY x
) as raw_data
WHERE x <= (
            CHAR_LENGTH(val)
            - CHAR_LENGTH( REPLACE ( val, ";", "") ) 
        ) / CHAR_LENGTH(";")        
    

Entity Framework Core 3 – Custom Functions (Using IMethodCallTranslator)

Every now and then Stack Overflow provides fantastic opportunities to learn something new. One user asked whether SQL Server’s DECRYPTBYPASSPHRASE can be implemented with Entity Framework Core 2.2 so they can fetch encrypted strings in SQL.

Continue reading “Entity Framework Core 3 – Custom Functions (Using IMethodCallTranslator)”