Cloud face-off: hosting static website

With the explosive growth of online services we’ve seen over 2020, it’s pretty clear the Public Cloud is going to pervade our lives more and more. The Internet is full of articles listing differences between platforms. But when we look closer, it all seems to pretty much fall into same groups: compute, storage and networking. Yes, naming is is different, but fundamentals are pretty much identical between all major providers.

Or are they?

Today we’re going to try and compare two providers by attempting to achieve the same basic goal – hosting a static web page.

the web page is going to be extremely simple:

<html>
     <body ng-app="myApp" ng-controller="myCtrl">
     <h2>Serverless API endpoint</h2>
     <input type="text" ng-model="apiUrl" /><button ng-click="fetch(apiUrl)" >Fetch</button>
         <h2>Function output below</h2>
         <pre>
             {{ eventBody| json }}
         </pre>
         <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/angular.js/1.8.2/angular.min.js" />
     </body>
 </html>

S3

With AWS, static website hosting is a feature of S3. All we need to do is create a bucket, upload our files and enable “Static website hosting” in Properties:

these warnings seems to hint at something…

One may think that this is all we have to do. But that huge information message on the page seems to be hinting at the fact that we might need to enable public access to the bucket. Let us test our theory:

indeed, there’d be no public access to our static assets by default

okay, this is fair enough – making a bucket secure by default is a good idea. One point to note here, is that enabling public access on this only bucket will not be enough. We will also need to disable “Block Public Access” on account level, which seems a bit too extreme at first. But hey, you cannot be too secure, right. AWS goes to great lengths to make it very obvious when customers do something potentially dangerous. Anyway, we’d go and enable the public access on account and bucket as instructed:

The first thing that jumps out here is a “Not Secure” warning. Indeed, S3 exposes the website via good old HTTP. And if we wanted secure transport – we’d have to opt for CloudFront.

Storage account

With Azure, the very first thing that we’d get to deal with would be the naming convention: Storage Account names can only contain lowercase letters and numbers. So right out of the gate, we’ll have to strip all dashes from our name of choice. Going through the wizard it’s relatively easy to miss the “Networking” section, but it’s exactly here we get to chose if our account will have public access or not. And by default it will be! So if you want it – you’ll have to tweak the security. Anyway, moving on to “Advanced” tab, we’re presented with another key difference: Storage Account endpoints use HTTPS by default.

Having created the account we should first enable the “Static Website hosting” option.

Reason for this order is that Azure creates a special container called $web, that we’ll then upload files to. But after we have done that – it’s pretty much done:

CORS

Both AWS and Azure allow configuring CORS for static websites. AWS is pretty upfront about it in their docs. Azure however makes a specific callout that CORS is not supported on static websites. My testing, however, indicates that it seems to work as intended. Consider the following scenario. We’d use the same website, but add a web font. And load it from across the other cloud: so Azure copy of the site will attempt to source the font from AWS and vice versa:

<link rel="stylesheet" href="css/stylesheet-aws.css" type="text/css" charset="utf-8" />
<style type="text/css">
body {
font-family: "potta_oneregular"
}
</style>

and the stylesheet would look like so (note the URL pointing to Azure):

@font-face {
     font-family: 'potta_oneregular';
     src: url('https://cloudfaceoffworkload.z26.web.core.windows.net/css/pottaone-regular-webfont.woff2') format('woff2'),
          url('https://cloudfaceoffworkload.z26.web.core.windows.net/css/pottaone-regular-webfont.woff') format('woff');
     font-weight: normal;
     font-style: normal;
 }

initially, this set up indeed results in a CORS error:

but the fix is very easy, just set up CORS in Azure:

and we get a fully working cross-origin resource consumption:

Doing it the other way around is a bit more complicated. Even though AWS allows us to configure policies, lack of HTTPS endpoint means browsers will likely refuse to load the fonts and we’d be forced onto CloudFront (which has its own benefits, but that would be a completely different story).

Summary

AspectAWS S3Azure Storage Account
Public by defaultnoyes
HTTPS with own domain namenoyes
Allows for human-readable namesyesnot really
CORS supportconfigurablenot supported by static websites (despite what the docs claim)

LINQ: Dynamic Join

Suppose we’ve got two lists that we would like to join based on, say, common Id property. With LINQ the code would look something along these lines:

var list1 = new List<MyItem> {};
var list2 = new List<MyItem> {};
var joined = list1.Join(list2, i => i.Id, j => j.Id, (k,l) => new {List1Item=k, List2Item=l});

resulting list of anonymous objects would have a property for each source object in the join. This is nothing new and has been documented pretty well.

But what if

We don’t know how many lists we’d have to join? Now we’ve got a list of lists of our entities (List Inception?!): List<List<MyItem>>. It becomes pretty obvious that we’d need to generate this code dynamically. We’ll run with LINQ Expression Trees – surely there’s a way. Generally speaking, we’ll have to build an object (anonymous type would be ideal) with fields like so:

{
  i0: items[0] // added on first run - we need to have at least two lists to join so it's safe to assume we'd
  i1: items[1] // added on first run - you need to have at least two lists in your join array 
  ... 
  iN: items[N] // added on each pass an joined with items[0] 
}

It is safe to assume that we need at least two lists for join to make sense, so we’d build the object above in two stages – first join two MyItem instances and get the structure going, Each subsequent join should append more MyItem instances to the resulting object until we’d get our result.

Picking types for the result

Now the problem is how we best define this object. The way anonymous types are declared, requires a type initialiser and a new keyword. We don’t have either of these at design time, so this method unfortunately will not work for us.

ExpandoObject

Another way to achieve decent developer experience with named object properties would be to use dynamic keyword – this is less than ideal as it effectively disables compiler static type checks. But we can keep going – so it’s an option here. To allow us to add properties at run time, we will use ExpandoObject:

static List<ExpandoObject> Join<TSource, TDest>(List<List<TSource>> items, Expression<Func<TSource, int>> srcAccessor, Expression<Func<ExpandoObject, int>> intermediaryAccessor, Expression<Func<TSource, TSource, ExpandoObject>> outerResultSelector)
{
	var joinLambdaType = typeof(ExpandoObject);            
	Expression<Func<ExpandoObject, TSource, ExpandoObject>> innerResultSelector = (expando, item) => expando.AddValue(item);
	
	var joinMethod = typeof(Enumerable).GetMethods().Where(m => m.Name == "Join").First().MakeGenericMethod(typeof(TSource), typeof(TSource), typeof(int), joinLambdaType);
	var toListMethod = typeof(Enumerable).GetMethods().Where(m => m.Name == "ToList").First().MakeGenericMethod(typeof(TDest));

	var joinCall = Expression.Call(joinMethod,
							Expression.Constant(items[0]),
							Expression.Constant(items[1]),
							srcAccessor,
							srcAccessor,
							outerResultSelector);
	joinMethod = typeof(Enumerable).GetMethods().Where(m => m.Name == "Join").First().MakeGenericMethod(typeof(TDest), typeof(TSource), typeof(int), joinLambdaType); // from now on we'll be joining ExpandoObject with MyEntity
	for (int i = 2; i < items.Count; i++) // skip the first two
	{
		joinCall =
			Expression.Call(joinMethod,
							joinCall,
							Expression.Constant(items[i]),
							intermediaryAccessor,
							srcAccessor,
							innerResultSelector);
	}

	var lambda = Expression.Lambda<Func<List<ExpandoObject>>>(Expression.Call(toListMethod, joinCall));
	return lambda.Compile()();
}

The above block references two extension methods so that we can easier manupulate the ExpandoObjects:

public static class Extensions 
 {
     public static ExpandoObject AddValue(this ExpandoObject expando, object value)
     {
         var dict = (IDictionary)expando;
         var key = $"i{dict.Count}"; // that was the easiest way to keep track of what's already in. You would probably find a way to do it better
         dict.Add(key, value);
         return expando;
     }
     public static ExpandoObject NewObject<T>(this ExpandoObject expando, T value1, T value2) 
     {
          var dict = (IDictionary<string, object>)expando;
          dict.Add("i0", value1);
          dict.Add("i1", value2);
          return expando; 
     }
 }

And with that, we should have no issue running a simple test like so:

class Program
{
    class MyEntity
    {
        public int Id { get; set; }
        public string Name { get; set; }

        public MyEntity(int id, string name)
        {
            Id = id; Name = name;
        }
    }

    static void Main()
    {
        List<List<MyEntity>> items = new List<List<MyEntity>> {
            new List<MyEntity> {new MyEntity(1,"test1_1"), new MyEntity(2,"test1_2")},
            new List<MyEntity> {new MyEntity(1,"test2_1"), new MyEntity(2,"test2_2")},
            new List<MyEntity> {new MyEntity(1,"test3_1"), new MyEntity(2,"test3_2")},
            new List<MyEntity> {new MyEntity(1,"test4_1"), new MyEntity(2,"test4_2")}
        };

        Expression<Func<MyEntity, MyEntity, ExpandoObject>> outerResultSelector = (i, j) => new ExpandoObject().NewObject(i, j); // we create a new ExpandoObject and populate it with first two items we join
        Expression<Func<ExpandoObject, int>> intermediaryAccessor = (expando) => ((MyEntity)((IDictionary<string, object>)expando)["i0"]).Id; // you could probably get rid of hardcoding this by, say, examining the first key in the dictionary
        
        dynamic cc = Join<MyEntity, ExpandoObject>(items, i => i.Id, intermediaryAccessor, outerResultSelector);

        var test1_1 = cc[0].i1;
        var test1_2 = cc[0].i2;

        var test2_1 = cc[1].i1;
        var test2_2 = cc[1].i2;
    }
}

ASP.Net Core – Resolving types from dynamic assemblies

It is not a secret that ASP.NET core comes with dependency injection support out of the box. And we don’t remember ever feeling it lacks features. All we have to do is register a type in Startup.cs and it’s ready to be consumed in our controllers:

public class Startup
{
    public void ConfigureServices(IServiceCollection services)
    {
        ...
        services.AddScoped<IDBLogger, IdbLogger>();
    }
}
//
public class HomeController : Controller
{
    private readonly IDBLogger _idbLogger;
    public HomeController(IDBLogger idbLogger)     
    {
        _idbLogger = idbLogger; // all good here!
    }
...
}

What if it’s a plugin?

Now imagine we’ve got a Order type that we for whatever strange reason load at runtime dynamically.

public class Order
{
     private readonly IDBLogger _logger; // suppose we've got the reference from common assembly that both our main application and this plugin are allowed to reference
     public Order(IDBLogger logger)
     {
         _logger = logger; // will it resolve?
     }
     public void GetOrderDetail()
     {
        _logger.Log("Inside GetOrderDetail"); // do we get a NRE here?
     }
}

Load it in the controller

External assembly being external kind of implies that we want to load it at the very last moment – right in our controller where we presumably need it. If we try explore this avenue, we immediately see the issue:

public HomeController(IDBLogger idbLogger)
 {
     _idbLogger = idbLogger;
     var assembly = Assembly.LoadFrom(Path.Combine("..\Controllers\bin\Debug\netcoreapp3.1", "Orders.dll"));
     var orderType = assembly.ExportedTypes.First(t => t.Name == "Order");
     var order = Activator.CreateInstance(orderType); //throws System.MissingMethodException: 'No parameterless constructor defined for type 'Orders.Order'.'
     orderType.InvokeMember("GetOrderDetail", BindingFlags.Public | BindingFlags.Instance|BindingFlags.InvokeMethod, null, order, new object[] { });
 }

The exception makes perfect sense – we need to inject dependencies! Making it so:

public HomeController(IDBLogger idbLogger)
 {
     _idbLogger = idbLogger;
     var assembly = Assembly.LoadFrom(Path.Combine("..\Controllers\bin\Debug\netcoreapp3.1", "Orders.dll"));
     var orderType = assembly.ExportedTypes.First(t => t.Name == "Order");
     var order = Activator.CreateInstance(orderType, new object[] { _idbLogger }); // we happen to know what the constructor is expecting
     orderType.InvokeMember("GetOrderDetail", BindingFlags.Public | BindingFlags.Instance|BindingFlags.InvokeMethod, null, order, new object[] { });
 }

Victory! or is it?

The above exercise is nothing new of exceptional – the point we are making here is – dependency injection frameworks were invented so we don’t have to do this manually. In this case it was pretty easy but more compex constructors can many dependencies. What’s worse – we may not be able to guarantee we even know all dependencies we need. If only there was a way to register dynamic types with system DI container…

Yes we can

The most naive solution would be to load our assembly on Startup.cs and register needed types along with our own:

public void ConfigureServices(IServiceCollection services)
 {
     services.AddControllersWithViews();
     services.AddScoped();
     // load assembly and register with DI  
     var assembly = Assembly.LoadFrom(Path.Combine("..\\Controllers\\bin\\Debug\\netcoreapp3.1", "Orders.dll")); 
    var orderType = assembly.ExportedTypes.First(t => t.Name == "Order");
    services.AddScoped(orderType); // this is where we would make our type known to the DI container  
    var loadedTypesCache = new LoadedTypesCache(); // this step is optional - i chose to leverage the same DI mechanism to avoid having to load assembly in my controller for type definition.  
    loadedTypesCache.LoadedTypes.Add("order", orderType);
    services.AddSingleton(loadedTypesCache); // singleton seems like a good fit here            
 }

And that’s it – literally no difference where the type is coming from! In controller, we’d inject IServiceProvider and ask to hand us an instance of type we cached earlier:

public HomeController(IServiceProvider serviceProvider, LoadedTypesCache cache)
{
     var order = serviceProvider.GetService(cache.LoadedTypes["order"]); // leveraging that same loaded type cache to avoid having to load assembly again
 // following two lines are just to call the method 
    var m = cache.LoadedTypes["order"].GetMethod("GetOrderDetail", BindingFlags.Public | BindingFlags.Instance); 
    m.Invoke(order, new object[] { }); // Victory!
}

Attaching debugger to dynamically loaded assembly with Reflection.Emit

Imagine a situation where you’d like to attach a debugger to an assembly that you have loaded dynamically? To make it a bit more plausible let us consider a scenario. Our client has a solution where they maintain extensive plugin ecosystem. Each plugin is a class library built with .net 4.5. Each plugin implements a common interface that main application is aware of. At runtime the application scans a folder and loads all assemblies into separate AppDomains. Under certain circumstances users/developers would like to be able to debug plugins in Visual Studio.
Given how seldom we would opt for this technique, documenting our solution might be an exercise in vain. But myself being a huge fan of weird and wonderful – I couldn’t resist going through with this case study.

Inventorying moving parts

First of all we’d need a way to inject code into the assembly. Apparently we can not directly replace methods we loaded from disk – SwapMethodBody() needs a DynamicModule. So we opted to define a subclass wrapper. Next, we need to actually stop execution and offer developers to start debugging. Using Debugger.Launch() is the easiest way to achieve that. Finally, we’d look at different ways to load assemblies into separate AppDomains to maintain existing convention.

Injecting Debugger.Launch()

The main attraction here – and Reflection.Emit is a perfect candidate for the job. Theory is fairly simple: we create a new dynamic assembly, module, type and a method. Then we generate code inside of the method and return wrapper instance:

public static object CreateWrapper(Type ServiceType, MethodInfo baseMethod)
{
    var asmBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly(new AssemblyName($"newAssembly_{Guid.NewGuid()}"), AssemblyBuilderAccess.Run);
    var module = asmBuilder.DefineDynamicModule($"DynamicAssembly_{Guid.NewGuid()}");
    var typeBuilder = module.DefineType($"DynamicType_{Guid.NewGuid()}", TypeAttributes.Public, ServiceType);
    var methodBuilder = typeBuilder.DefineMethod("Run", MethodAttributes.Public | MethodAttributes.NewSlot);

    var ilGenerator = methodBuilder.GetILGenerator();

    ilGenerator.EmitCall(OpCodes.Call, typeof(Debugger).GetMethod("Launch", BindingFlags.Static | BindingFlags.Public), null);
    ilGenerator.Emit(OpCodes.Pop);

    ilGenerator.Emit(OpCodes.Ldarg_0);
    ilGenerator.EmitCall(OpCodes.Call, baseMethod, null);
    ilGenerator.Emit(OpCodes.Ret);

    /*
     * the generated method would be roughly equivalent to:
     * new void Run()
     * {
     *   Debugger.Launch();
     *   base.Run();
     * }
     */

    var wrapperType = typeBuilder.CreateType();
    return Activator.CreateInstance(wrapperType);
}

Triggering the method

After we’ve generated a wrapper – we should be in position to invoke the desired method. In this example I’m using all-reflection approach:

public void Run()
{
    var wrappedInstance = DebuggerWrapperGenerator.CreateWrapper(ServiceType, ServiceType.GetMethod("Run"));
    wrappedInstance.GetType().GetMethod("Run")?.Invoke(wrappedInstance, null);
 // nothing special here
}

The task becomes even easier if we know the interface to cast to.

Adding AppDomain into the mix

The above parts don’t depend much on where the code will run. However, trying to satisfy the layout requirement, we experimented with a few different configurations. In the end it appears that I’m able to confidently place the code in correct AppDomain by either leveraging .DoCallBack() or making sure that Launcher helper is created with .CreateInstanceAndUnwrap():

static void Main(string[] args)
{
    var appDomain = AppDomain.CreateDomain("AppDomainInMain", AppDomain.CurrentDomain.Evidence,
        new AppDomainSetup { ApplicationBase = AppDomain.CurrentDomain.SetupInformation.ApplicationBase });

    appDomain.DoCallBack(() =>
    {
        var launcher = new Launcher(PathToDll);
        launcher.Run();
    });    
}
static void Main(string[] args)
{
    Launcher.RunInNewAppDomain(PathToDll);
}
public class Launcher : MarshalByRefObject
{
    private Type ServiceType { get; }

    public Launcher(string pathToDll)
    {
        var assembly = Assembly.LoadFrom(pathToDll);
        ServiceType = assembly.GetTypes().SingleOrDefault(t => t.Name == "Class1");
    }

    public void Run()
    {
        var wrappedInstance = DebuggerWrapperGenerator.CreateWrapper(ServiceType, ServiceType.GetMethod("Run"));
        wrappedInstance.GetType().GetMethod("Run")?.Invoke(wrappedInstance, null);
    }

    public static void RunInNewAppDomain(string pathToDll)
    {
        var appDomain = AppDomain.CreateDomain("AppDomainInLauncher", AppDomain.CurrentDomain.Evidence, AppDomain.CurrentDomain.SetupInformation);

        var launcher = appDomain.CreateInstanceAndUnwrap(typeof(Launcher).Assembly.FullName, typeof(Launcher).FullName, false, BindingFlags.Public|BindingFlags.Instance,
            null, new object[] { pathToDll }, CultureInfo.CurrentCulture, null);
        (launcher as Launcher)?.Run();
    }
}

Testing it

In the end we’ve got the following prompt:

after letting it run through, we’d get something looking like this:

As usual, full code for this example sits in my GitHub if you want to take it for a spin.

ARR: Setting up

Not so long ago a client asked us to spec up their CI/CD pipeline. They are going through devops transformation and as part of their “speed up delivery” objective they wanted to minimize downtime when they deploy new versions of their software or run maintenance.

🟦🟩-deployments

First thing we wanted to try was to introduce blue-green approach. The application runs on IIS and luckily for us, Microsoft offers a solution there: ARR. There’s heaps documentation online, and most examples seem to point at scaling out by routing traffic to application round robin. In our case the application was not ready for that just yet so we decided to use it for directing all traffic to one backend server only while we deploy the inactive one:

typical blue-green diagram

Farming Web Farms

ARR introduces a concept of Web Farms. This basically is a logical grouping of content servers that ARR treats as one site. Each farm comes with settings on how caching should work, or what actual content servers are like. It’s pretty easy to set up when we’ve got one or two of there. But in our case we were looking at approximately 100 farms. Yikes! Overall the process is pretty simple: create farm, add content servers, create URL rewrite rule. Nothing fancy and documentation is plentiful. What we wanted to do however was to automate everything into one script that could later run remotely when triggered by CI/CD pipelines.

PowerShell to the rescue

Our requirements were pretty standard until we realized that there’s no easy way to insert URL rewrite rules into arbitrary positions in the list. So we implemented a set of dummy rules that the script uses as anchors to locate a place where to inject new rule. We also needed node health check to cut off inactive servers, the easiest was a plain text file in website root with words “UP” or “DOWN” so that we can swap ARR slots by simply updating a file. ARR supports a few ways to programmatically manage, but since we’re on Windows we picked PowerShell as our tool of choice and ended up with something like this:

function CheckIfExists($xpath, $name, $remove = $true) {      
   $existing = Get-WebConfigurationProperty -pspath $psPath "$xpath[@name='$name']" -Name .
   if($null -ne $existing) {      
      if($remove) {
         Clear-WebConfiguration -pspath $psPath -Filter "$xpath[@name='$name']"
      }
      return  $true
   }
function IndexOfNode($collection, $name) {
   $i=0
   for ($i=0; $i -lt $existing.Collection.Count; $i++)
   {
      if ($collection[$i].name -eq $name) 
      { 
         return $i
      }
   }
   return $i-1 #found nothing - return position at the end of collection
}

function CreateRule($name, $matchUrl = "*", $atAnchor = "") {
   $matchingPatternSyntax = if ($matchUrl -eq "*") {"Wildcard"} else { "ECMAScript" };

   $existing = Get-WebConfiguration -pspath $psPath "system.webServer/rewrite/globalRules"
   $index = IndexOfNode $existing.Collection $atAnchor

   Add-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules" -AtIndex $index -name "." -value @{name=$name;patternSyntax=$matchingPatternSyntax;stopProcessing='True';enabled='True'}
   Set-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules/rule[@name='$name']/match" -name "url" -value $matchUrl
}

function CreateRuleCondition($name, $in = "{HTTP_HOST}", $pattern, $negate = $false) 
{
   $value = @{
      input=$in;
      pattern=$pattern; 
   }

   if($negate -eq $true) {
      $value.Add("negate", "True")
   }

   Add-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules/rule[@name='$name']/conditions" -name "." -value $value
}

function CreateRewriteAction($name, $url) {   
   Set-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules/rule[@name='$name']/action" -name "type" -value "Rewrite"
   Set-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules/rule[@name='$name']/action" -name "url" -value "$url/{R:0}"
}

function CreateRedirectRule(
   $ruleName,      
   $matchUrl,
   $conditionHost,
   $farmName,
   $recreate = $true,
   $atName
) 
{
   if(CheckIfExists "system.webServer/rewrite/globalRules/rule" $ruleName $recreate) {
      if($recreate) {
         Write-Host "Removed existing $ruleName before proceeding"
      } else {
         Write-Host "Skipped existing $ruleName"
         return
      }
   }
   
   # Create a new rule
   CreateRule "$ruleName" -matchUrl $matchUrl -atAnchor $atName
   Set-WebConfigurationProperty -pspath $psPath  -filter "system.webServer/rewrite/globalRules/rule[@name='$farmName']/conditions" -name "logicalGrouping" -value "MatchAny"
   
   $conditionHost | ForEach-Object {
      CreateRuleCondition $ruleName -pattern $_
   }
   
   CreateRewriteAction $farmName "https://$farmName"
}

function CreateWebFarm(
      $farmName,
      $healthCheckUrl,
      $blueIpAddress,
      $greenIpAddress,
      $parentSiteHostName,
      $Recreate = $true
   )
{
   return; #debugging
   if(CheckIfExists "webFarms/webFarm" $farmName $Recreate) {
      if($Recreate) {
         Write-Host "Removed existing $farmName before proceeding"
      } else {
         Write-Host "Skipped existing $farmName"
         return
      }
   }
   
   Add-WebConfigurationProperty -pspath $psPath  -filter "webFarms" -name "." -value @{name=$farmName}
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/affinity" -name "useCookie" -value "True"
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/protocol/cache" -name "enabled" -value "False"
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/healthCheck" -name "url" -value $healthCheckUrl
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/healthCheck" -name "interval" -value "00:00:10"
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/healthCheck" -name "timeout" -value "00:00:5"
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/applicationRequestRouting/healthCheck" -name "responseMatch" -value "UP"

   Add-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']" -name "." -value @{address=$blueIpAddress}
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/server[@address='$blueIpAddress']/applicationRequestRouting" -name "hostName" -value $parentSiteHostName

   Add-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']" -name "." -value @{address=$greenIpAddress}
   Set-WebConfigurationProperty -pspath $psPath  -filter "webFarms/webFarm[@name='$farmName']/server[@address='$greenIpAddress']/applicationRequestRouting" -name "hostName" -value $parentSiteHostName
}

$host
 = "newservice.example.com"
$farm = "newservice-farm"

CreateWebFarm $farm "https://$farm/healthcheck.htm" $greenIp $blueIp $host
CreateRedirectRule -ruleName $farm -matchUrl "*" -conditionHost @($host) -farmName $farm -atName "--Inserting rules above this point--"

Parsing OData queries

OData (Open Data Protocol) is an ISO approved standard that defines a set of best practices for building and consuming RESTful APIs. It allows us write business logic and not worry too much about request and response headers, status codes, HTTP methods, and other variables.

We won’t go into too much detail on how to write OData queries and how to use it – there’s plenty resources out there. We’ll rather have a look at a bit esoteric scenario where we consider defining our own parser and then walking the AST to get desired values.

Problem statement

Suppose we’ve got a filter string that we received from the client:

"?$filter =((Name eq 'John' or Name eq 'Peter') and (Department eq 'Professional Services'))"

And we’d like to apply custom validation to the filter. Ideally we’d like to get a structured list of properties and values so we can run our checks:

Filter 1:
    Key: Name
    Operator: eq
    Value: John
Operator: or

Filter 2:
    Key: Name
    Operator: eq
    Value: Peter

Operator: and

Filter 3:
    Key: Department
    Operator: eq
    Value: Professional Services

Some options are:

  • ODataUriParser – but it seems to have some issues with .net Core support just yet
  • Regular Expression – not very flexible
  • ODataQueryOptions – produces raw text but cannot broken down any further

What else?

One other way to approach this would be parsing. And there are plenty tools to do that (see flex or bison for example). In .net world, however, Irony might be a viable option: it’s available in .net standard 2.0 which we had no issues plugging into a .net core 3.1 console test project.

Grammar

To start off, we normally need to define a grammar. But luckily, Microsoft have been kind enough to supply us with EBNF reference so all we have to do is to adapt it to Irony. I ended up implementing a subset of the grammar above that seems to cater for example statement (and a bit above and beyond, feel free to cut it down).

using Irony.Parsing;

namespace irony_playground
{
    [Language("OData", "1.0", "OData Filter")]
    public class OData: Grammar
    {
        public OData()
        {
            // first we define some terms
            var identifier = new RegexBasedTerminal("identifier", "[a-zA-Z_][a-zA-Z_0-9]*");
            var string_literal = new StringLiteral("string_literal", "'");
            var integer_literal = new NumberLiteral("integer_literal", NumberOptions.IntOnly);
            var float_literal = new NumberLiteral("float_literal", NumberOptions.AllowSign|NumberOptions.AllowSign) 
                                        | new RegexBasedTerminal("float_literal", "(NaN)|-?(INF)");
            var boolean_literal = new RegexBasedTerminal("boolean_literal", "(true)|(false)");

            var filter_expression = new NonTerminal("filter_expression");
            var boolean_expression = new NonTerminal("boolean_expression");
            var collection_filter_expression = new NonTerminal("collection_filter_expression");
            var logical_expression = new NonTerminal("logical_expression");
            var comparison_expression = new NonTerminal("comparison_expression");
            var variable = new NonTerminal("variable");
            var field_path = new NonTerminal("field_path");
            var lambda_expression = new NonTerminal("lambda_expression");
            var comparison_operator = new NonTerminal("comparison_operator");
            var constant = new NonTerminal("constant");

            Root = filter_expression; // this is where our entry point will be. 

            // and from here on we expand on all terms and their relationships
            filter_expression.Rule = boolean_expression;

            boolean_expression.Rule = collection_filter_expression
                                      | logical_expression
                                      | comparison_expression
                                      | boolean_literal
                                      | "(" + boolean_expression + ")"
                                      | variable;
            variable.Rule = identifier | field_path;

            field_path.Rule = MakeStarRule(field_path, ToTerm("/"), identifier);

            collection_filter_expression.Rule =
                field_path + "/all(" + lambda_expression + ")"
                | field_path + "/any(" + lambda_expression + ")"
                | field_path + "/any()";

            lambda_expression.Rule = identifier + ":" + boolean_expression;

            logical_expression.Rule =
                boolean_expression + (ToTerm("and", "and") | ToTerm("or", "or")) + boolean_expression
                | ToTerm("not", "not") + boolean_expression;

            comparison_expression.Rule =
                variable + comparison_operator + constant |
                constant + comparison_operator + variable;

            constant.Rule =
                string_literal
                | integer_literal
                | float_literal
                | boolean_literal
                | ToTerm("null");

            comparison_operator.Rule = ToTerm("gt") | "lt" | "ge" | "le" | "eq" | "ne";

            RegisterBracePair("(", ")");
        }
    }
}

NB: Irony comes with Grammar Explorer tool that allows us to load grammar dlls and debug them with free text input.

enter image description here

after we’re happy with the grammar, we need to reference it from our project and parse the input string:

class Program
{
    static void Main(string[] args)
    {
        var g = new OData();
        var l = new LanguageData(g);
        var r = new Parser(l);
        var p = r.Parse("((Name eq 'John' or Name eq 'Grace Paul') and (Department eq 'Finance and Accounting'))"); // here's your tree
        // this is where you walk it and extract whatever data you desire 
    }
}

Then, all we’ve got to do is walk the resulting tree and apply any custom logic based on syntax node type. One example how to do that can be found in this StackOverflow answer.

Entity Framework Core 3.1 – dynamic WHERE clause

Every now and then we get tasked with building a backend for filtering arbitrary queries. Usually clients would like to have a method of sending over an array of fields, values, and comparisons operations in order to retrieve their data. For simplicity we’ll assume that all conditions are joining each other with an AND operator.

public class QueryableFilter {
    public string Name { get; set; }
    public string Value { get; set; }
    public QueryableFilterCompareEnum? Compare { get; set; }
}

With a twist

There’s however one slight complication to this problem – filters must apply to fields on dependent entities (possible multiple levels of nesting as well). This can become a problem not only because we’d have to traverse model hierarchy (we’ll touch on that later), but also because of ambiguity this requirement introduces. Sometimes we’re lucky to only have unique column names across the hierarchy. However more often than not this needs to be resolved one way or another. We can, for example, require filter fields to use dot notation so we know which entity each field relates to. For example, Name -eq "ACME Ltd" AND Name -eq "Cloud Solutions" becomes company.Name -eq "ACME Ltd" AND team.Name -eq "Cloud Solutions"

Building an expression

It is pretty common that clients already have some sort of data querying service with EF Core doing the actual database comms. And since EF relies on LINQ Expressions a lot – we can build required filters dynamically.

public static IQueryable<T> BuildExpression<T>(this IQueryable<T> source, DbContext context, string columnName, string value, QueryableFilterCompareEnum? compare = QueryableFilterCompareEnum.Equal)
{
	var param = Expression.Parameter(typeof(T));

	// Get the field/column from the Entity that matches the supplied columnName value
	// If the field/column does not exists on the Entity, throw an exception; There is nothing more that can be done
	MemberExpression dataField;
	
	var model = context.Model.FindEntityType(typeof(T)); // start with our own entity
	var props = model.GetPropertyAccessors(param); // get all available field names including navigations
	var reference = props.First(p => RelationalPropertyExtensions.GetColumnName(p.Item1) == columnName); // find the filtered column - you might need to handle cases where column does not exist

	dataField = reference.Item2 as MemberExpression; // we happen to already have correct property accessors in our Tuples	

	ConstantExpression constant = !string.IsNullOrWhiteSpace(value)
		? Expression.Constant(value.Trim(), typeof(string))
		: Expression.Constant(value, typeof(string));

	BinaryExpression binary = GetBinaryExpression(dataField, constant, compare);
	Expression<Func<T, bool>> lambda = (Expression<Func<T, bool>>)Expression.Lambda(binary, param);
	return source.Where(lambda);
}

Most of the code above is pretty standard for building property accessor lambdas, but GetPropertyAccessors is the key:

private static IEnumerable<Tuple<IProperty, Expression>> GetPropertyAccessors(this IEntityType model, Expression param)
{
	var result = new List<Tuple<IProperty, Expression>>();

	result.AddRange(model.GetProperties()
								.Where(p => !p.IsShadowProperty()) // this is your chance to ensure property is actually declared on the type before you attempt building Expression
								.Select(p => new Tuple<IProperty, Expression>(p, Expression.Property(param, p.Name)))); // Tuple is a bit clunky but hopefully conveys the idea

	foreach (var nav in model.GetNavigations().Where(p => p is Navigation))
	{
		var parentAccessor = Expression.Property(param, nav.Name); // define a starting point so following properties would hang off there
		result.AddRange(GetPropertyAccessors(nav.ForeignKey.PrincipalEntityType, parentAccessor)); //recursively call ourselves to travel up the navigation hierarchy
	}

	return result;
}

this is where we interrogate EF as-built data model, traverse navigation properties and recursively build a list of all properties we can ever filter on!

Testing it out

Talk is cheap, let’s run a complete example here:

public class Entity
{
	public int Id { get; set; }
}
class Company : Entity
{
	public string CompanyName { get; set; }
}

class Team : Entity
{
	public string TeamName { get; set; }
	public Company Company { get; set; }
}

class Employee : Entity
{
	public string EmployeeName { get; set; }
	public Team Team { get; set; }
}

class DynamicFilters<T> where T : Entity
{
	private readonly DbContext _context;

	public DynamicFilters(DbContext context)
	{
		_context = context;
	}

	public IEnumerable<T> Filter(IEnumerable<QueryableFilter> queryableFilters = null)
	{
		IQueryable<T> mainQuery = _context.Set<T>().AsQueryable().AsNoTracking();
		// Loop through the supplied queryable filters (if any) to construct a dynamic LINQ-to-SQL queryable
		foreach (var filter in queryableFilters ?? new List<QueryableFilter>())
		{
			mainQuery = mainQuery.BuildExpression(_context, filter.Name, filter.Value, filter.Compare);
		}

		mainQuery = mainQuery.OrderBy(x => x.Id);

		return mainQuery.ToList();
	}
}
// --- DbContext
class MyDbContext : DbContext
{
	public DbSet<Company> Companies { get; set; }
	public DbSet<Team> Teams { get; set; }
	public DbSet<Employee> Employees { get; set; }

	protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
	{
		optionsBuilder.UseSqlServer("Server=.\\SQLEXPRESS;Database=test;Trusted_Connection=true");
		base.OnConfiguring(optionsBuilder);
	}
}
// ---
static void Main(string[] args)
{
	var context = new MyDbContext();
	var someTableData = new DynamicFilters<Employee>(context).Filter(new
	List<QueryableFilter> { new QueryableFilter { Name = "CompanyName", Value = "ACME Ltd" }, new QueryableFilter { Name = "TeamName", Value = "Cloud Solutions" } });
}

The above block should produce following SQL:

SELECT [e].[Id], [e].[EmployeeName], [e].[TeamId]
FROM [Employees] AS [e]
LEFT JOIN [Teams] AS [t] ON [e].[TeamId] = [t].[Id]
LEFT JOIN [Companies] AS [c] ON [t].[CompanyId] = [c].[Id]
WHERE [c].[CompanyName] = N'ACME Ltd'
 AND [t].[TeamName] = N'Cloud Solutions'
ORDER BY [e].[Id]

Entity Framework Core 3.1 – Peeking Into Generated SQL

Writing LINQ that produces optimal SQL can be even harder as developers often don’t have visibility into the process. It becomes even more confusing when the application is designed to run against different databases.

We often find ourselves questioning whether this particular query will fall in line with our expectations. And until not so long ago our tool of choice was a SQL Profiler, that ships with SQL Server. It’s plenty powerful but has one flaw – it pretty much requires the SQL Server installation. This might be a deal breaker for some clients using other DBs, like Postgres or MySQL (which are all supported by the way).

EF to the resque

Instead of firing off the profiler and fishing out the batches, we could have Entity Framework itself pass us the result. After all, it needs to build SQL before sending it off the the database, so all we have to do it to ask nicely. Stack Overflow is quite helpful here:

public static class IQueryableExtensions // this is the EF Core 3.1 version.
    {
        public static string ToSql<TEntity>(this IQueryable<TEntity> query) where TEntity : class
        {
            var enumerator = query.Provider.Execute<IEnumerable<TEntity>>(query.Expression).GetEnumerator();
            var relationalCommandCache = enumerator.Private("_relationalCommandCache");
            var selectExpression = relationalCommandCache.Private<SelectExpression>("_selectExpression");
            var factory = relationalCommandCache.Private<IQuerySqlGeneratorFactory>("_querySqlGeneratorFactory");

            var sqlGenerator = factory.Create();
            var command = sqlGenerator.GetCommand(selectExpression);

            string sql = command.CommandText;
            return sql;
        }

        private static object Private(this object obj, string privateField) => obj?.GetType().GetField(privateField, BindingFlags.Instance | BindingFlags.NonPublic)?.GetValue(obj);
        private static T Private<T>(this object obj, string privateField) => (T)obj?.GetType().GetField(privateField, BindingFlags.Instance | BindingFlags.NonPublic)?.GetValue(obj);
    }

The usage is simple

Suppose we’ve got the following inputs: One simple table, that we’d like to group by one field and total by another. Database Context is also pretty much boilerplate. One thing to note here is a couple of database providers we are going to try the query against.

public class SomeTable
{
    public int Id { get; set; }
    public int Foobar { get; set; }
    public int Quantity { get; set; }
}

class MyDbContext : DbContext
{
    public DbSet<SomeTable> SomeTables { get; set; }
    public static readonly LoggerFactory DbCommandConsoleLoggerFactory
        = new LoggerFactory(new[] {
        new ConsoleLoggerProvider ((category, level) =>
            category == DbLoggerCategory.Database.Command.Name &&
            level == LogLevel.Trace, true)
        });
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        //run with SQL Server provider to get T-SQL
        optionsBuilder.UseNpgsql("Server=localhost;Port=5432;Database=test;User Id=;Password=;")
        //alternatively use other supported provider
        //optionsBuilder.UseSqlServer("Server=.\\SQLEXPRESS;Database=test;Trusted_Connection=true")
        ;
        base.OnConfiguring(optionsBuilder);
    }
}

The test bench would look something like so

class Program
{
    static void Main(string[] args)
    {

        var context = new MyDbContext();
        var someTableData = context.SomeTables
                .GroupBy(x => x.Foobar)
                .Select(x => new { Foobar = x.Key, Quantity = x.Sum(y => y.Quantity) })
                .OrderByDescending(x => x.Quantity)
                .Take(10) // we've built our query as per normal
                .ToSql(); // this is the magic
        Console.Write(someTableData);
        Console.ReadKey();
    }
}

And depending on our choice of provider the output would show ef core generated sql for SQL Server and Postgres

        -- MSSQL
        SELECT TOP(@__p_0) [s].[Foobar], SUM([s].[Quantity]) AS [Quantity]
        FROM [SomeTables] AS [s]
        GROUP BY [s].[Foobar]
        ORDER BY SUM([s].[Quantity]) DESC

        -- PG SQL
         SELECT s."Foobar", SUM(s."Quantity")::INT AS "Quantity"
        FROM "SomeTables" AS s
        GROUP BY s."Foobar"
        ORDER BY SUM(s."Quantity")::INT DESC
        LIMIT @__p_0

Running https with Docker

It’s not a secret we love Docker. And with recent changes to how Chrome treats SameSite cookies it’s become a bit of a pain to develop any sort of oAuth solutions with containers: these have to go over SSL so the browser takes it.

Tools like dotnet dev-certs do provide some relief by generating self-signed certs and adding those to trusted store on host machine. In short – most of the time, host-to-container development is not an issue.

What if we need more than one domain?

Sometimes there will be cases where we’d like to access the same service by two domain names. It might be useful if Host header is required:

we can opt for what’s known a SAN certificate. It’s an extension to x.509 that allows us to reuse the same cert for multiple domain names. We can then trust certificate on our dev machine and make Docker use the same cert for HTTPS:

#create a SAN cert for both server.docker.local and localhost
$cert = New-SelfSignedCertificate -DnsName "server.docker.local", "localhost" -CertStoreLocation cert:\localmachine\my

#export it for docker container to pick up later
$password = ConvertTo-SecureString -String "123123" -Force -AsPlainText
Export-PfxCertificate -Cert $cert -FilePath C:\https\aspnetapp.pfx -Password $password

# trust it on our host machine
$store = New-Object System.Security.Cryptography.X509Certificates.X509Store "TrustedPublisher","LocalMachine"
$store.Open("ReadWrite")
$store.Add($cert)
$store.Close()

More containers?

Sometimes we want one container to talk to another while retaining the ability to check up on things from localhost. Consider the following docker-compose:

version: '3'
services:
  client: # client process that needs to talk to server
    depends_on:
      - server
  server: # server that we'd also like to access from the outside
    image:     
    ports:
      - "8443:443"

This would roughtly translate to the following network layout:

Problems start

When one container needs to talk to another container it’s a slightly different story: dev tools don’t have control over containers and cannot magically trust certificates inside there. We can try opt for properly signed certificates (from Let’s Encrypt for example), but that’s a whole different story and is likely not worth it for development machines.

The above powershell script is also going to fall short as it’s only adding the cert onto development machine – containers will keep failing to validate the cert. The solution would build on the same principles: generate a self-signed cert and trust it everywhere. Since most Docker containers run Linux we might have better luck going the opposite direction and generating certs in PEM format using a well known tool OpenSSL. Then we’d use Dockerfiles to inject this cert into all our containers and finally we’d convert it into format suitable for our host Windows machine (PKCS#12).

$certPass = "password_here"
$certSubj = "host.docker.internal"
$certAltNames = "DNS:localhost,DNS:host.docker.internal,DNS:identity_server" # we can also add individual IP addresses here like so: IP:127.0.0.1
$opensslPath="path\to\openssl\binaries" #aOpenSSL needs to be present on the host, no installation is necessary though
$workDir="path\to\your\project"
$dockerDir=Join-Path $workDir "ProjectApi"

#generate a self-signed cert with multiple domains
Start-Process -NoNewWindow -Wait -FilePath (Join-Path $opensslPath "openssl.exe") -ArgumentList "req -x509 -nodes -days 365 -newkey rsa:2048 -keyout ",
                                          (Join-Path $workDir aspnetapp.key),
                                          "-out", (Join-Path $dockerDir aspnetapp.crt),
                                          "-subj `"/CN=$certSubj`" -addext `"subjectAltName=$certAltNames`""

# this time round we convert PEM format into PKCS#12 (aka PFX) so .net core app picks it up
Start-Process -NoNewWindow -Wait -FilePath (Join-Path $opensslPath "openssl.exe") -ArgumentList "pkcs12 -export -in ", 
                                           (Join-Path $dockerDir aspnetapp.crt),
                                           "-inkey ", (Join-Path $workDir aspnetapp.key),
                                           "-out ", (Join-Path $workDir aspnetapp.pfx),
                                           "-passout pass:$certPass"

$password = ConvertTo-SecureString -String $certPass -Force -AsPlainText
$cert = Get-PfxCertificate -FilePath (Join-Path $workDir "aspnetapp.pfx") -Password $password

# and still, trust it on our host machine
$store = New-Object System.Security.Cryptography.X509Certificates.X509Store [System.Security.Cryptography.X509Certificates.StoreName]::Root,"LocalMachine"
$store.Open("ReadWrite")
$store.Add($cert)
$store.Close()

Example: Running Identity Server

Now we have our certs (for example, located in %USERPROFILE%.aspnet\https). Here’s a quick how to tell asp.net core -base containers to pick them up:

docker pull your_docker_image
docker run --rm -it -p 8000:80 -p 8001:443 -e ASPNETCORE_URLS="https://+;http://+" -e ASPNETCORE_HTTPS_PORT=8001 -e ASPNETCORE_Kestrel__Certificates__Default__Password="123123" -e ASPNETCORE_Kestrel__Certificates__Default__Path=\https\aspnetapp.pfx -v %USERPROFILE%\.aspnet\https:C:\https\ your_docker_image

docker run <your image> --rm -it -p 8000:80 -p 8001:443 -e ASPNETCORE_URLS="https://+;http://+" -e ASPNETCORE_HTTPS_PORT=8001 -e ASPNETCORE_Kestrel__Certificates__Default__Password="123123" -e ASPNETCORE_Kestrel__Certificates__Default__Path=/https/aspnetapp.pfx

Or in docker-compose format:

version: '3'
services:
  identity_server:
    image: mcr.microsoft.com/dotnet/core/samples:aspnetapp    
    environment: 
      - ASPNETCORE_URLS=https://+:443;http://+:80
      - ASPNETCORE_Kestrel__Certificates__Default__Password=password_here
      - ASPNETCORE_Kestrel__Certificates__Default__Path=/https/aspnetapp.pfx
    volumes:
      - ~/.aspnet/https/:/https/:ro 
    container_name: identity_server
    ports:
      - "8443:443"
      - "8080:80"

Data visualisation with Vega

We love nice dashboards. And if you see a chart somewhere on a webpage – chances are it runs D3.js. D3.js is a JavaScript library for manipulating documents based on data. It allows you to do a great deal of visualisation but comes with a bit of a learning curve. Even though the data can come in any shape and form, plotting and transformations are JavaScript.

Declarative approach

This is where Vega comes forward. Everything is now a JSON therefore we can literally build and ship visualisations without touching JavaScript at all!

Step by step

Suppose we’ve got hierarchical animal data represented by following JSON:

"values": [
        {"id": "1", "parent": null, "title": "Animal"},
        {"id": "2", "parent": "1", "title": "Duck"},
        {"id": "3", "parent": "1", "title": "Fish"},
        {"id": "4", "parent": "1", "title": "Zebra"}
      ]

What we can then do is to lay the nodes out in a tree-like shape (stratify does the job):

"transform": [
        {
          "type": "stratify",
          "key": "id",
          "parentKey": "parent"
        },
        {
          "type": "tree",
          "method": "tidy",
          "separation": true,
          "size": [{"signal": "width"}, {"signal": "height"}]
        }
      ]

having laid out the nodes, we need to generate connecting lines, treelinks + linkpath combo does exactly that:

{
      "name": "links",
      "source": "tree", // take datasource "tree" as input
      "transform": [
        { "type": "treelinks" }, // apply transform 1
        { "type": "linkpath", // follow up with next transform
          "shape": "diagonal"
          }
      ]
    }

now that we’ve got our data sources, we want to draw actual objects. In Vega these are called marks. For simplicity I’m only drawing one rectangle with a title for each data point and some basic lines to connect:

"marks": [
    {
      "type": "path",
      "from": {"data": "links"}, // dataset we defined above
      "encode": {
        "enter": {
          "path": {"field": "path"} // linkpath generated a dataset with "path" field in it - we just grab it here
        }
      }
    },
    {
      "type": "rect",
      "from": {"data": "tree"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    },
    {
      "type": "text",
      "from": {"data": "tree"}, // use data set we defined earlier
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "text": {"field": "title"}, // we can use data fields to display actual values
          "x": {"field": "x"}, // use data fields to draw values from
          "y": {"field": "y"},
          "dx": {"value":50}, // offset the mark to appear in rectangle center
          "dy": {"value":13},
          "align": {"value": "center"}
        }
      }
    }
  ]

All in all we arrived at a very basic hierarchical chart. It looks kinda plain and can definitely be improved: the rectangles there should probably be replaced with groups and connection paths will need some work too.

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "width": 800,
  "height": 300,
  "padding": 5,

  "data": [
    {
      "name": "tree",
      "values": [
        {"id": "1", "parent": null, "title": "Animal"},
        {"id": "2", "parent": "1", "title": "Duck"},
        {"id": "3", "parent": "1", "title": "Fish"},
        {"id": "4", "parent": "1", "title": "Zebra"}
      ],
      "transform": [
        {
          "type": "stratify",
          "key": "id",
          "parentKey": "parent"
        },
        {
          "type": "tree",
          "method": "tidy",
          "separation": true,
          "size": [{"signal": "width"}, {"signal": "height"}]
        }
      ]      
    },
    {
      "name": "links",
      "source": "tree",
      "transform": [
        { "type": "treelinks" },
        { "type": "linkpath",
          "shape": "diagonal"
          }
      ]
    }, 
    {
      "name": "tree-boxes",
      "source": "tree",
      "transform": [
          { 
            "type": "filter",
            "expr": "datum.parent == null"
          }
        ]
    },
    {
      "name": "tree-circles",
      "source": "tree",
      "transform": [
        {
          "type": "filter",
          "expr": "datum.parent != null"
        }
      ]
    }
  ],
  "marks": [
    {
      "type": "path",
      "from": {"data": "links"},
      "encode": {
        "enter": {
          "path": {"field": "path"}
        }
      }
    },
    {
      "type": "rect",
      "from": {"data": "tree-boxes"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    },
    {
      "type": "symbol",
      "from": {"data": "tree-circles"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    },
    {
      "type": "rect",
      "from": {"data": "tree"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    },
    {
      "type": "text",
      "from": {"data": "tree"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "text": {"field": "title"},
          "x": {"field": "x"},
          "y": {"field": "y"},
          "dx": {"value":50},
          "dy": {"value":13},
          "align": {"value": "center"}
        }
      }
    }
  ]
}

Getting a bit fancier

Suppose, we would like to render different shapes for root and leaf nodes of our chart. One way to achieve this will be to add two filter transformations based on your tree dataset and filter them accordingly:

    {
      "name": "tree-boxes",
      "source": "tree", // grab the existing data
      "transform": [
          { 
            "type": "filter",
            "expr": "datum.parent == null" // run it through a filter defined by expression
          }
        ]
    },
    {
      "name": "tree-circles",
      "source": "tree",
      "transform": [
        {
          "type": "filter",
          "expr": "datum.parent != null"
        }
      ]
    }

then instead of rendering all marks as rect we’d want two different shapes for respective transformed datasets:

{
      "type": "rect",
      "from": {"data": "tree-boxes"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    },
    {
      "type": "symbol",
      "from": {"data": "tree-circles"},
      "encode": {
        "enter": {
          "stroke": {"value": "black"},
          "width": {"value": 100},
          "height": {"value": 20},
          "x": {"field": "x"},
          "y": {"field": "y"}
        }
      }
    }

Demo time

Play with Vega in live editor here.