none
Meta-Programming and parser extensibility in C# and Roslyn

    General discussion

  • While Roslyn is a fantastic framework to parse/analyse/modify C# code, the lack of extensibility in the parser part is really frustrating in meta-programming scenarios.

    This is of course not the entire fault of Roslyn itself, but more a missing feature in C#, but it would be really powerful to be able to extend the language syntax (at several possible - fixed - points into the AST tree), without having to write a full parser on our own. I know that traditional lexer/parser are not always friendly for this kind of extension points, but I have several situations where I would like to provide custom embedded DLR in C# (as it is the case for Linq syntax, or async/await), and this is currently not possible without going through the laborious way of rewriting a whole C# parser/analyser (or using existing Mono as a starting point).

    We really need a way to express in C# itself language extension that can be consumed by the parser to generate transformed C# code that will get compiled. Extensions languages like async/await or linq expressions should be able to be expressed into this meta-programming C# language.

    Let see how this could be done. Suppose that we would like to introduce a new keyword "mykeyword" which takes a block expression ({...}), and this is supported at the class and struct body level. I would declare this syntax extension like this:

    using Microsoft.CSharp.LanguageExtensions;

    public static class MyLanguageExtension { /// <summary> /// Creates an new syntax "mykeyword + { ... }" at the Class and Struct body level. /// </summary> /// <returns>A syntax declaration describing the language extension</returns> [SyntaxExtension] public static SyntaxDeclaration Create() { return new SyntaxDeclaration { // Defines a BNF like syntax Syntax = new SyntaxToken("mykeyword") + new SyntaxBlockExpression(), // Defines where this syntax is accepted Location = SyntaxLocation.ClassBody | SyntaxLocation.StructBody, // Callback after the AST tree is built, in order to provide code transformation into pure C# code. AstHandler = MyKeywordHandler }; } public static void MyKeywordHandler(SyntaxDeclaration syntaxDeclaration, CompilerContext context) { // Implements here modification to the ast through the compiler context that will contain the AST tree // This methods typically replace the AST tree node "mykeyword" with some static fields, methods... whatever // in the class to translate it into pure C# language. } }

    And a typical usage of this new keyword would then be:

    public class MyClass
    {
        // Use the keyword previously introduced at the class body level
        mykeyword { Console.WriteLine("This is an extension method at the class body level");}
    }

    The AstHandler would then generate appropriate pure C# code in order to translate this embedded DLR declaration into the C# class itself. There are lots of limit cases to take care, and viable extension points to identify (for example, It would even be nice to be able to add new modifiers, like public/private, for whatever syntax elements...etc), but there is nothing that would prevent this kind of scenario to be provided.

    Having this kind of extensibility would be a major enhancement, as It would be possible to leverage on C# to include some embedded DLR, but with having the full parsing experience supported (External tools like Resharper would even be able to consume language extensions and provides them in their syntax analysis/highlighting).

    One technical challenge is that the parser should support extensible syntax declaration points. This is absolutely feasible, but most regular parsers (like lex/yacc) generate statics DFA states with little or no extension points, but this should not be a problem to do it, as It is fairly easy to add extension points into a DFA. The only downside is that the parser would have to verify that the language extension is not generating an invalid dynamic DFA at compiling time, but I don't think this is a major issue.

    I would love to see this kind of things implemented in C#, as much as I love to work in C#!

    Let me know what you think about this.


    Alexandre Mutel - SharpDX - NShader - Code4k



    • Edited by Alexandre MutelMVP Friday, September 21, 2012 2:14 AM Add comment in c# code snippet
    Friday, September 21, 2012 1:58 AM

All replies

  • I think use Antlr or "Oslo" MGrammar or some parser generator to create a almost complete/almost correct C# parser is not very hard, and create Roslyn syntax tree from the parse tree is easy, and extend C# syntax is resticted only by your imagination.
    Sunday, September 23, 2012 12:00 PM
  • I think use Antlr or "Oslo" MGrammar or some parser generator to create a almost complete/almost correct C# parser is not very hard, and create Roslyn syntax tree from the parse tree is easy, and extend C# syntax is resticted only by your imagination.

    The whole point of my post is that C# language should evolve for more meta-programming. I don't want to develop yet another pet language derived from C# on my own, nor to develop yet-another C# parser (as I said, I wouldn't do it and use an existing solution like the one in msc Mono).  I would like this supported at the core of C# and this cannot be done without a strong commitment from Microsoft C#/BCL teams. Meta-programming is the big thing for C# and would be a fantastic step forward for this great language.


    Alexandre Mutel - SharpDX - NShader - Code4k

    Sunday, September 23, 2012 12:24 PM
  • "Meta-programming" is a buzz word, there are 1,000 meta-programming in 1,000 programmer's eyes.
    If I am Anders Hejlsberg, I don't want to evolve C# intensely, if someone want some language feature -- for example, concurrency -- extend C# syntax(I call the extensions DSL), translate the DSL element to C# element, use Roslyn to analyse the semantics, finally, generate the C# implementation code. Axum(http://www.microsoft.com/en-us/download/details.aspx?id=21024) is a good case study.
    This is the meta-programming I understand.
    Sunday, September 23, 2012 1:31 PM
  • Your remark "there are 1,000 meta-programming in 1,000 programmer's eyes." is absolutely pointless. We could say the same thing for generics/templates or for asynchronous programming... Is this a reason not to add a crucial feature to a language? No and no. They made some choices for these evolutions of the language that don't fit every "1000 programmer's eyes", that's life, but this would be an absurd reason to lock any new language features just because It does not fit to everyone.

    Have you ever realized that existing "extension method" introduced in C# 3.0 are actually an embryo of meta-programming? If meta-programming is just a buzz for you, from my experience, It is a real need in every business programming day.



    Alexandre Mutel - SharpDX - NShader - Code4k

    Sunday, September 23, 2012 1:54 PM
  • Hi Alexandre,

    There are external DSL vs. internal DSL, maybe there are external meta-programming(or XXX, the name doesn't matter) vs. internal meta-programming too. Axum is an example of external meta-programming, what is the internal meta-programming? Can you explain it or show me some example? Thanks.

    (My english is poor, meta-programming is very charming and mystical)

    Sunday, September 23, 2012 2:45 PM
  • There are external DSL vs. internal DSL, maybe there are external meta-programming(or XXX, the name doesn't matter) vs. internal meta-programming too. Axum is an example of external meta-programming, what is the internal meta-programming? Can you explain it or show me some example? Thanks.

    Not sure about your terminology about external DSL vs internal DSL. I assume that when you mean internal DSL it is an embedded DSL into a another language?

    "External" meta-programming is currently used in C# by using custom attributes or custom identifiable code (a particular method candidate for substitution), using library like Mono.Cecil to replace custom attributes by some modification of the IL. An example: you tag a property to have IPropertyNotifyChanged automatically generated inside the method. You don't want to write this kind of laborious code, so we can use some kind of "external" meta-programming. Obviously, because meta-programming is not a core features of C#, we have to use workarounds, and this is a typical workaround. So globally, the process is to add some meta-data to an assembly, in order to process it once it is compiled, and change/enhance the generated bytecode/methods/properties/whatever. This is somehow what we could called "deferred" meta-programming.

    Sadly, due to the nature of the "complexity" of this technique (not everybody wants to manipulate IL), It is rarely used or even evaluated as a possibility by the majority of the developers.

    "Internal" meta-programming would allow inside the language/compiler infrastructure itself to add the ability to add new keywords/syntax construction and transform the code before It gets compiled to an assembly. I can give you a straight example: Let's say we are living in the fifth dimension where the .NET 4.0 would have meta-programming capabilities but wouldn't have the async/await features. With "internal" meta-programming, It would be quite feasible to add such a feature. Async/Await is a typical candidate for meta-programming: Add some keywords/new construction to a language that will get translated to pure C# language (This could be applied also to linq expressions).

    So "Internal" meta-programming could be used also to add some "Internal" DSL, but It has a broader audience, as It can be both used to extend the language (in a general manner) or to add custom keywords/syntax construction that fits your application business domain.

    Nothing charming or mystical here, really, most of the time, we are using some external (post byte code manipulation) or internal ("extension methods" in C#) meta-programming without noticing it.


    Alexandre Mutel - SharpDX - NShader - Code4k

    Sunday, September 23, 2012 3:16 PM
  • From the comments given by the Roslyn team I don't think metaprogramming is in the scope of the project.

    Personally I don't think extension points at the grammar level are a good idea, if there is more than one extension you are very likely to get an ambigous grammar. One of the points of C# (and Roslyn) is having a very fast parser, which means careful design of the language and probably rules out using a general-purpose parser-generator (it always needs manual tweaking).

    That said, if you just need metaprogramming and not custom DSL syntax, you *can* do that with Roslyn. (With that I understand being able to write source code which executes at compile time to inject classes or code.) To do that you parse comments or syntactically invalid code with Roslyn and then use a SyntaxWalker to fix it up.

    I've done some experiments in this direction and written a simple mixin system (which covers most of what I currently need from C# metaprogramming). The syntax I use to switch between metaprogramming and normal programming are:

    • On the source file level I surround it with #if/#endif markers with a special symbol
      This will skip the region on normal parsing, but the SyntaxWalker can just parse the content of the region into another Roslyn syntax tree. This is used to define the actual mixins and utility classes available for compile time evaluation. (Alternatively to #if/#endif markers I also consider adding another file extension to 'cs' but didn't implement that yet.)
    • On the class level, for "annotations", I use "magic comments". I've got a SyntaxWalker which checks for comments with a "//$" prefix and interprets the rest of the line as a method call. This method will receive as parameter the syntax element in front of which it is placed. Those method calls can be used to inject members into classes.
    • For templating code blocks for injection I use the invalid syntax "&delegate{...}" - i.e. taking the address of a literal delegate. I got a SyntaxWalker which replaces this delegate by a source-level syntax tree (i.e. a big chunk of Syntax.* calls which reproduce the syntax tree of the template).
    • For source injection I mostly use pointer syntax (works on both type and expression level). When the SyntaxWalker encounters something like that it evaluates the expression at compile time and inserts the result value.

    Obviously that's one big hack and you need to do additional work to have intellisense in the "metaprogramming scope", but theoretically it should be doable in a similar fashion to ASP.NET (you can nest multiple editor scopes in Visual Studio, aspx files mix html scopes with scripting scopes, both using different editors in the same file).

    So my bottom line is that metaprogramming in Roslyn is already possible with relatively low amount of work, but doing proper Visual Studio integration is a lot more work.

    Monday, September 24, 2012 8:38 AM
  • From the comments given by the Roslyn team I don't think metaprogramming is in the scope of the project.

    A while ago, I read somewhere on this forum from some people working on Roslyn that It was in the scope of the project, but It was not in the scope of the 1st release... this could have been changed.

    Personally I don't think extension points at the grammar level are a good idea, if there is more than one extension you are very likely to get an ambigous grammar. One of the points of C# (and Roslyn) is having a very fast parser, which means careful design of the language and probably rules out using a general-purpose parser-generator (it always needs manual tweaking).

    If you slightly restrict extension points and the way the grammar can be expressed (for example, to match a group of statement, you will use something like new SyntaxStatementBlock();), you will reduce possible ambiguous grammar. Also, verification of a grammar can be done once the extension is loaded into the compiler, and as csc is working on a batch of file it is less a problem. Also, concerning performance, the first main bottleneck in a parser is the tokenizer. By not allowing new token, you can keep the tokenizer as optimized as it is. The AST tree construction deduced from the DFA of the grammar is usually not the most time consuming part... compare to the semantic analysis that has to perform much more checks. A parser needs manual tweaking when there are some ambiguities, but if you restrict syntax element to a set of syntax elements that are already supported by the parser, you shouldn't need any manual tweaking (again by not allowing everything everywhere).

    That said, if you just need metaprogramming and not custom DSL syntax, you *can* do that with Roslyn. (With that I understand being able to write source code which executes at compile time to inject classes or code.) To do that you parse comments or syntactically invalid code with Roslyn and then use a SyntaxWalker to fix it up.

    In fact, I would like to be able to embed DSL with new keywords and code constructions (and not only pure code transformation, as this kind of thing can already be done by postpatching IL assemblies), in a pretty similar way linq expressions and async/await have been embedded (though, It has been certainly "hardcoded" and not implemented in a meta-programming way). This kind of scenario is absolutely common whenever you want to leverage on an existing language but want to add some DSL constructions specific to your domain. I'm not talking about coding a whole DSL inside C#, but like async/await, asynchronous programming can be done without those nice keywords but It is much nicer to use it with them!

    Thanks for sharing your approach, that's interesting. It is indeed usable for some usecases,  but as you said, that's one big workaround-hack to overcome the lack of meta-programming in Roslyn, and I would really prefer something built-in into the language/compiler, that will be supported by intellisense, external tools (like ReSharper), and easily shareable between projects (just by sharing the assembly that contains the syntax extension, that would be a LOT easier to spread the usability of meta-programming).


    Alexandre Mutel - SharpDX - NShader - Code4k

    Monday, September 24, 2012 1:33 PM
  • I'm sure microsoft guys won't reply any discussion about meta-programming
    Tuesday, September 25, 2012 2:38 AM
  • One of the goals for the Roslyn project is to enable us (Microsoft) to more easily experiment with and implement new language features for future versions of C# and VB.  Meta-programming scenarios are definitely on our radar for things we want to explore with the language.  However, Roslyn is not designed with the goal of helping users extend the language themselves.

    This thread contains a number of interesting ideas on how a system like Roslyn could support user-defined language extensions.  However, the design space is very complicated and what we would have to do to support this would be very complicated.

    To take one example, consider parsing.  Roslyn's parser is a hand-built incremental parser.  As a practical matter it can keep up with your typing speed, updating the parse tree for the whole source file on every keystroke.  While there are some pretty good parser generators out there, few of them compare with a hand-built parser in the speed of the parser and in the quality of diagnostics for syntax errors.  As far as I know there are not really any good incremental parser generators out there.  If we wanted to support user-extensible syntax, we would need to have some more automated parser-generation system, which probably means building it ourselves.  That would be a distraction from our work toward Roslyn's goals.

    With that in mind, I repeat that meta-programming is definitely an interesting direction for the programming languages and the tools, and Roslyn may be a tool toward that end, but Roslyn doesn't yet attempt to provide much to help you.

    -Neal [Roslyn team]

    Thursday, September 27, 2012 4:57 PM
  • The best example of "internal" metaprogramming is...C++ template metaprogramming.
    Adding any extensions(syntax and semantics) to a language is "external" metaprogramming.
    (It's only my perspective)
    Saturday, September 29, 2012 11:46 AM
  • Thanks Neal for the clarification about Roslyn and meta-programming.

    Never had a look at the internals of Lexer/Parser Roslyn until now, but it is greatly designed! (interesting to see that you have used a full handmade lexer, for which performance-wise consistency across all methods is quite tricky to handle). Though, I still believe that having a handmade lexer/parser is probably a bit easier to modify and plug into some extensions than an auto-generated one. You wouldn't have to change the whole architecture to bring some extensibility. Just choose carefully the set of LanguageParser.ParseXXX methods that would allow some extensions points and begin to work with this set. Assuming that the state table for these language extensions would be pre-computed at parser init (so in the end, a simple state machine that will call some LanguageParser.ParseXXX raw methods), I bet that It wouldn't hurt performance of the parser so much. A lexer working at the character level is usually more time consuming than the parser working on tokens.

    Anyway, I hope that Roslyn team will find some way to experiment this ;) and glad to hear that meta-programming is still on your radar.


    Alexandre Mutel - SharpDX - NShader - Code4k

    Monday, October 01, 2012 5:32 AM
  • Is TypeScript a meta-programming language?

    A class in TypeScript (above)- and what it compiles to (below)

    Tuesday, October 02, 2012 2:49 PM
  • Hi Alexandre,
    I really like your idea. It corresponds exactly with feature of modern programming languages which I am missing. When I found Roslyn I tried immediately to check its limits and implement delegating feature described at itmustwork.blogspot.cz - typical example how the meta-programming can be useful. I think that my imagination about meta-programming is very similar to the yours.

    Now I see that I have already found Roslyn's limits. Nevertheless the Roslyn is still wonderful project. The tool significantly improves capabilities of the developers in code processing. The Roslyn in its current state is the necessary prerequisite for the meta-programming.

    I do not know whether there is any other language/platform which supports meta-programming according to our imagination but I believe that such feature would significantly distinguish the progressive  platforms from the others. I can not imagine a lot of feature which would move a platform to the next level but it seems that meta-programming is one of them.

    I am not scientist and I do not have an overview about whether something like "extensible grammars" are well explored. If not, it sounds like a good theme for academical work. There must be some rules for the grammars which can identify an "extending rule" which can harm the grammar (its unambiguity).

    Let's spread the idea about meta-programming (I will help you :) ) it is a straight way to Domain Specific Languages built upon an existing language. Also an aspect oriented programming would be feasible in a simple natural way with this feature (no more IL modifications). As I said, it would be great advantage for C# in comparison with other platforms and I really look forward to this feature.

    Sunday, January 13, 2013 4:49 PM
  • (Sorry to dig up an old thread.)

    This is a great idea, Alexandre. Let's hope that it is being seriously thought about for a future version of Roslyn / C#.

    Since @JindrichB asked, I wanted to mention Nemerle. Nemerle is a language that targets .NET, and has exactly this kind of metaprogramming built-in. JetBrains hired the core developers a year ago (http://blogs.jetbrains.com/dotnet/2012/06/jetbrains-and-nemerle/), and nothing has been heard of them since, but I'm still optimistic that they'll announce something cool.

    There's a page explaining Nemerle's macros (one of the key ways it supports macroprogramming) here:
    https://github.com/rsdn/nemerle/wiki/Macros-tutorial

    Here's an example from the Wikipedia article:

    macro ReverseFor (i, begin, body)
    syntax ("ford", "(", i, ";", begin, ")", body)
    {
      <[ for ($i = $begin; $i >= 0; $i--) $body ]>
    }

    ... which can be used like this:

    ford (i ; n) print (i);

    Nemerle is a very cool language, and I hope JetBrains give it the support it needs to become, if not exactly mainstream, then at least a viable alternative to C#.

    Thursday, August 08, 2013 5:18 AM
  • "enable us (Microsoft) to more easily experiment with and implement new language features"

    I'm interested in the Roslyn for exactly this kind of thing, to experiment with proof-of-concepts for language features.  If anything for the learning experience/fun of it.

    If I understand correctly, to use Roslyn to compile a modified C# syntax, I would need to actually modify the Roslyn source code, correct?  Just to get me pointed so I know if I should be researching through the Roslyn source, or only the public API.  The Roslyn API does not expose a way to provide hooks for processing custom syntax, correct?  Assuming the syntax changes are simple(maybe a certain construct that's easy to pattern match, and handoff to a custom handler for that block of code).

    Friday, July 18, 2014 11:55 PM
  • I'm not sure if this falls along the same line of discussion, but I really am bothered by alot of C# code that relies heavily on reflection, when everything that is needed is known at compile time.  Consider serialization.  In Java you can use Externalizable to provide a very fast serialization at compile time, over the Serializable interface that would use reflection at runtime. 

    With the right compile time features you should be able to generate the IL for serialization at compile time.  You use your knowledge of the class structure at compile time to generate the serialization code, rather than waiting till runtime to reflectively serialize the class.

    This is just one example, but I find it increasingly frustrating to try and write generic reusable code without relying on reflection, especially when I am reflecting over information that is known at compile time.  For example, I've seen apps that suffer from a proliferation of switch/cases with a case for each value of an enum, and thus anytime a new enum value is added, we must go around and hunt down all the switch/cases to support that new value.  Instead I am able to use a combination of reflection with attributes on the enum to write loops that go through the enum values, such that it will gracefully handle new enum values without requiring that piece of code to be touched.  However, it is silly to have to use runtime reflection for this, because the information I am leveraging is already known at compile time.

    I know you can hack around this with T4 templates, but usually that involves splitting things up into partial classes so one part can be generated, and then you can't leverage your C# skillset because you are working with a different language to accomplish what you want.


    • Edited by AaronLST Monday, July 21, 2014 8:16 PM
    Saturday, July 19, 2014 12:08 AM