LINQ: Dynamic Join

Photo by Ilya Pavlov on Unsplash

Suppose we’ve got two lists that we would like to join based on, say, common Id property. With LINQ the code would look something along these lines:

var list1 = new List<MyItem> {};
var list2 = new List<MyItem> {};
var joined = list1.Join(list2, i => i.Id, j => j.Id, (k,l) => new {List1Item=k, List2Item=l});

resulting list of anonymous objects would have a property for each source object in the join. This is nothing new and has been documented pretty well.

But what if

We don’t know how many lists we’d have to join? Now we’ve got a list of lists of our entities (List Inception?!): List<List<MyItem>>. It becomes pretty obvious that we’d need to generate this code dynamically. We’ll run with LINQ Expression Trees – surely there’s a way. Generally speaking, we’ll have to build an object (anonymous type would be ideal) with fields like so:

{
  i0: items[0] // added on first run - we need to have at least two lists to join so it's safe to assume we'd
  i1: items[1] // added on first run - you need to have at least two lists in your join array 
  ... 
  iN: items[N] // added on each pass an joined with items[0] 
}

It is safe to assume that we need at least two lists for join to make sense, so we’d build the object above in two stages – first join two MyItem instances and get the structure going, Each subsequent join should append more MyItem instances to the resulting object until we’d get our result.

Picking types for the result

Now the problem is how we best define this object. The way anonymous types are declared, requires a type initialiser and a new keyword. We don’t have either of these at design time, so this method unfortunately will not work for us.

ExpandoObject

Another way to achieve decent developer experience with named object properties would be to use dynamic keyword – this is less than ideal as it effectively disables compiler static type checks. But we can keep going – so it’s an option here. To allow us to add properties at run time, we will use ExpandoObject:

static List<ExpandoObject> Join<TSource, TDest>(List<List<TSource>> items, Expression<Func<TSource, int>> srcAccessor, Expression<Func<ExpandoObject, int>> intermediaryAccessor, Expression<Func<TSource, TSource, ExpandoObject>> outerResultSelector)
{
	var joinLambdaType = typeof(ExpandoObject);            
	Expression<Func<ExpandoObject, TSource, ExpandoObject>> innerResultSelector = (expando, item) => expando.AddValue(item);
	
	var joinMethod = typeof(Enumerable).GetMethods().Where(m => m.Name == "Join").First().MakeGenericMethod(typeof(TSource), typeof(TSource), typeof(int), joinLambdaType);
	var toListMethod = typeof(Enumerable).GetMethods().Where(m => m.Name == "ToList").First().MakeGenericMethod(typeof(TDest));

	var joinCall = Expression.Call(joinMethod,
							Expression.Constant(items[0]),
							Expression.Constant(items[1]),
							srcAccessor,
							srcAccessor,
							outerResultSelector);
	joinMethod = typeof(Enumerable).GetMethods().Where(m => m.Name == "Join").First().MakeGenericMethod(typeof(TDest), typeof(TSource), typeof(int), joinLambdaType); // from now on we'll be joining ExpandoObject with MyEntity
	for (int i = 2; i < items.Count; i++) // skip the first two
	{
		joinCall =
			Expression.Call(joinMethod,
							joinCall,
							Expression.Constant(items[i]),
							intermediaryAccessor,
							srcAccessor,
							innerResultSelector);
	}

	var lambda = Expression.Lambda<Func<List<ExpandoObject>>>(Expression.Call(toListMethod, joinCall));
	return lambda.Compile()();
}

The above block references two extension methods so that we can easier manupulate the ExpandoObjects:

public static class Extensions 
 {
     public static ExpandoObject AddValue(this ExpandoObject expando, object value)
     {
         var dict = (IDictionary)expando;
         var key = $"i{dict.Count}"; // that was the easiest way to keep track of what's already in. You would probably find a way to do it better
         dict.Add(key, value);
         return expando;
     }
     public static ExpandoObject NewObject<T>(this ExpandoObject expando, T value1, T value2) 
     {
          var dict = (IDictionary<string, object>)expando;
          dict.Add("i0", value1);
          dict.Add("i1", value2);
          return expando; 
     }
 }

And with that, we should have no issue running a simple test like so:

class Program
{
    class MyEntity
    {
        public int Id { get; set; }
        public string Name { get; set; }

        public MyEntity(int id, string name)
        {
            Id = id; Name = name;
        }
    }

    static void Main()
    {
        List<List<MyEntity>> items = new List<List<MyEntity>> {
            new List<MyEntity> {new MyEntity(1,"test1_1"), new MyEntity(2,"test1_2")},
            new List<MyEntity> {new MyEntity(1,"test2_1"), new MyEntity(2,"test2_2")},
            new List<MyEntity> {new MyEntity(1,"test3_1"), new MyEntity(2,"test3_2")},
            new List<MyEntity> {new MyEntity(1,"test4_1"), new MyEntity(2,"test4_2")}
        };

        Expression<Func<MyEntity, MyEntity, ExpandoObject>> outerResultSelector = (i, j) => new ExpandoObject().NewObject(i, j); // we create a new ExpandoObject and populate it with first two items we join
        Expression<Func<ExpandoObject, int>> intermediaryAccessor = (expando) => ((MyEntity)((IDictionary<string, object>)expando)["i0"]).Id; // you could probably get rid of hardcoding this by, say, examining the first key in the dictionary
        
        dynamic cc = Join<MyEntity, ExpandoObject>(items, i => i.Id, intermediaryAccessor, outerResultSelector);

        var test1_1 = cc[0].i1;
        var test1_2 = cc[0].i2;

        var test2_1 = cc[1].i1;
        var test2_2 = cc[1].i2;
    }
}