Parsing OData queries - Timur and associates

OData (Open Data Protocol) is an ISO approved standard that defines a set of best practices for building and consuming RESTful APIs. It allows us write business logic and not worry too much about request and response headers, status codes, HTTP methods, and other variables.

We won’t go into too much detail on how to write OData queries and how to use it – there’s plenty resources out there. We’ll rather have a look at a bit esoteric scenario where we consider defining our own parser and then walking the AST to get desired values.

Problem statement

Suppose we’ve got a filter string that we received from the client:

"?$filter =((Name eq 'John' or Name eq 'Peter') and (Department eq 'Professional Services'))"

And we’d like to apply custom validation to the filter. Ideally we’d like to get a structured list of properties and values so we can run our checks:

Filter 1:
    Key: Name
    Operator: eq
    Value: John
Operator: or

Filter 2:
    Key: Name
    Operator: eq
    Value: Peter

Operator: and

Filter 3:
    Key: Department
    Operator: eq
    Value: Professional Services

Some options are:

ODataUriParser – but it seems to have some issues with .net Core support just yet
Regular Expression – not very flexible
ODataQueryOptions – produces raw text but cannot broken down any further

What else?

One other way to approach this would be parsing. And there are plenty tools to do that (see flex or bison for example). In .net world, however, Irony might be a viable option: it’s available in .net standard 2.0 which we had no issues plugging into a .net core 3.1 console test project.

Grammar

To start off, we normally need to define a grammar. But luckily, Microsoft have been kind enough to supply us with EBNF reference so all we have to do is to adapt it to Irony. I ended up implementing a subset of the grammar above that seems to cater for example statement (and a bit above and beyond, feel free to cut it down).

using Irony.Parsing;

namespace irony_playground
{
    [Language("OData", "1.0", "OData Filter")]
    public class OData: Grammar
    {
        public OData()
        {
            // first we define some terms
            var identifier = new RegexBasedTerminal("identifier", "[a-zA-Z_][a-zA-Z_0-9]*");
            var string_literal = new StringLiteral("string_literal", "'");
            var integer_literal = new NumberLiteral("integer_literal", NumberOptions.IntOnly);
            var float_literal = new NumberLiteral("float_literal", NumberOptions.AllowSign|NumberOptions.AllowSign) 
                                        | new RegexBasedTerminal("float_literal", "(NaN)|-?(INF)");
            var boolean_literal = new RegexBasedTerminal("boolean_literal", "(true)|(false)");

            var filter_expression = new NonTerminal("filter_expression");
            var boolean_expression = new NonTerminal("boolean_expression");
            var collection_filter_expression = new NonTerminal("collection_filter_expression");
            var logical_expression = new NonTerminal("logical_expression");
            var comparison_expression = new NonTerminal("comparison_expression");
            var variable = new NonTerminal("variable");
            var field_path = new NonTerminal("field_path");
            var lambda_expression = new NonTerminal("lambda_expression");
            var comparison_operator = new NonTerminal("comparison_operator");
            var constant = new NonTerminal("constant");

            Root = filter_expression; // this is where our entry point will be. 

            // and from here on we expand on all terms and their relationships
            filter_expression.Rule = boolean_expression;

            boolean_expression.Rule = collection_filter_expression
                                      | logical_expression
                                      | comparison_expression
                                      | boolean_literal
                                      | "(" + boolean_expression + ")"
                                      | variable;
            variable.Rule = identifier | field_path;

            field_path.Rule = MakeStarRule(field_path, ToTerm("/"), identifier);

            collection_filter_expression.Rule =
                field_path + "/all(" + lambda_expression + ")"
                | field_path + "/any(" + lambda_expression + ")"
                | field_path + "/any()";

            lambda_expression.Rule = identifier + ":" + boolean_expression;

            logical_expression.Rule =
                boolean_expression + (ToTerm("and", "and") | ToTerm("or", "or")) + boolean_expression
                | ToTerm("not", "not") + boolean_expression;

            comparison_expression.Rule =
                variable + comparison_operator + constant |
                constant + comparison_operator + variable;

            constant.Rule =
                string_literal
                | integer_literal
                | float_literal
                | boolean_literal
                | ToTerm("null");

            comparison_operator.Rule = ToTerm("gt") | "lt" | "ge" | "le" | "eq" | "ne";

            RegisterBracePair("(", ")");
        }
    }
}

NB: Irony comes with Grammar Explorer tool that allows us to load grammar dlls and debug them with free text input.

after we’re happy with the grammar, we need to reference it from our project and parse the input string:

class Program
{
    static void Main(string[] args)
    {
        var g = new OData();
        var l = new LanguageData(g);
        var r = new Parser(l);
        var p = r.Parse("((Name eq 'John' or Name eq 'Grace Paul') and (Department eq 'Finance and Accounting'))"); // here's your tree
        // this is where you walk it and extract whatever data you desire 
    }
}

Then, all we’ve got to do is walk the resulting tree and apply any custom logic based on syntax node type. One example how to do that can be found in this StackOverflow answer.