How to retrieve SyntaxNodes between #region and #endregion directives?

Answered How to retrieve SyntaxNodes between #region and #endregion directives?

  • Friday, August 31, 2012 11:43 AM
     
     

    Hi,

    is there an easy wat to retrieve all SyntaxNodes that are located between a #region and the corresponding #endregion directive?

    Currently this seems to be a really messy task.

    I know that #region and #endregion are only supposed to organize code. But I would like to analyse some code between those directives.

All Replies

  • Sunday, September 02, 2012 1:13 PM
     
     

    Hi Jochen,

    I think you can just iterate through all the tokens in a SyntaxTree(you can create a walker to do that), and check the LeadingTrivias of the tokens to see if the #region directive you are looking for is contained. Once you have found that directive, you can start to collect the upcoming nodes until you met a token with a #endregion directive in its LeadingTrivias. Be careful, #region directives can be nested, so you should also maintain the nest level in order to handle regions nested in your region correctly.


    • Edited by Hillin Sunday, September 02, 2012 1:15 PM
    •  
  • Tuesday, September 04, 2012 10:09 PM
     
      Has Code

    Thanks four your reply. I would like to tell you that it was that easy - I expected this as well. But I'll give you an example that will show how ugly things can get with the current modeling of directives like #region and #endregion:

    The first thing to think about is how to match a #region directive to the appropiate #endregion directive. This is easy: Just implement a SyntaxWalker, override the VisitRegionDirective() and VisitEndRegionDirective() methods like follows:

    Stack<RegionNode> stack;
    Dictionary<RegionNode, EndRegionNode> regionToEndRegion;
    
    VisitRegionDirective(RegionDirective regionNode) {
     stack.Push(node);
    }
    
    VisitEndRegionDirective(EndRegionDirective endRegionNode) {
     var regionNode = regionStack.Pop();
     regionToEndRegion.Add(regionNode, endRegionNode);
    }

    Now we can access tuples of hierarchical #region and #endregion directives.

    Next we need to find the nodes between those nodes.

    Lets assume following code:

    1 void method() {}
    2  #region outer
    3   #region inner
    4    int a = 0;
    5   #endregion
    6  #endregion
    7 } 

    The <#region, #endregion> tuples are correctely computed: <2, 6> and <3, 5>. The corresponding SyntaxTree will look like this:

    As you can see the RegionDirective nodes are a child of the PredifinedType node which corresponds to the 'int' token in line 4 while the EndRegionDirective nodes are a child of the CloseBraceToken node which corresponds to the '}' token in line 7.

    Your approach - a SyntaxWalker that waits for the first RegionDirective, then adds all nodes that are visited next to a list and closes this list when the corresponding EndRegionDirective is reached - will not work since the list will contain the VariableDeclarator node (see image) only!

    This would work if SyntaxTrivia nodes would be modeled different: They would have to be direct children of the Block node and not children of the next SyntaxNode that appears after (according to the line numbers in the source code) the SyntaxTrivia node. I don't understand why they are modeled this way - could someone please explain this to me?

    As a statement oriented reader of source code I would appreciate the following (shortend) SyntaxTree (the type of a node is enclosed in brackets):

    MethodDeclaration (SyntaxNode)
    |- ...
    |- Block (SyntaxNode)
       |- OpenBraceToken (SyntaxToken)
       |- RegionDirective (SyntaxTrivia)
       |- RegionDirective (SyntaxTrivia)
       |- LocalVariableDeclaration (SyntaxNode)
          |- VariableDeclaration (SyntaxNode)
             |- PredifinedType (SyntaxNode)
             |- VariableDeclarator (SyntaxNode)
       |- EndRegionDirective (SyntaxTrivia)
       |- EndRegionDirective (SyntaxTrivia)
       |- CloseBraceToken (SyntaxToken)

    Any reason why this modeling is worse than the current modeling? It seems to be a very random decission to assume that a RegionDirective node is leading a SyntaxNode - it actually is between two SyntaxNodes and does not belong to any SyntaxNode.

    I would like to see a flas in the Roslyn.Compilers.CSharp.ParseOption class that tells the SyntaxTree.ParseCompilationUnit() method to create this (differenty modeled syntax tree) at least for structured trivia.

    Thanks,

    Jochen



    • Edited by Jochen Huck Tuesday, September 04, 2012 10:16 PM
    •  
  • Wednesday, September 05, 2012 7:40 AM
     
     

    Yes, treating these directives as trivias here makes it looked wierd, it just looks like the directive is a child of LocalDeclarationStatement(and descendingly VariableDeclaration and PredefinedType) node, while it is actually the parent of them.

    I think modeling the directives(along with comments etc.) as trivias is just to provide convenience for parsing up nodes from tokens, so it won't happen if you say now I have a type token and I am expecting an identifier, but the lexer told me the next token is a comment or something else meaningless to the node parsing phase.

    Maybe you can try this approach: when you are visiting a region directive, you recursively visit the parent node and check if the beginning position of the node is the same to the token which contains the directive. If it is, it is a node which has the token as its first token, and should be inside the region. Break if it is not.

  • Wednesday, September 05, 2012 12:42 PM
     
      Has Code

    Thanks again for your reply. What you described is exactely what I am doing right now to retrieve the corresponding SyntaxNode for a RegionDirective. It works as well for some EndRegionDirective nodes - but not for all of them: It works if the EndRegionDirective is a lead of a statement in a list of statements (like in the below example).

    method() {
     #region A
     int a = 0;
     #endregion
     int b = 0;
    }

    It does not work if the EndRegionDirective is a lead of a CloseBraceToken (since it is a child of a Block then):

    method() {
     #region A
     int a = 0;
     #endregion
    }

    I think I will be able to work around this as well - but it will get really hacky...

    I will attach my current code (that does not find a corresponding SyntaxNode for an EndRegionDirective in the second example yet):

    class RegionToEndRegionMatcher : SyntaxWalker {
     private Stack<RegionDirectiveSyntax> regionStack;
     private SyntaxTree tree;
    
     public Dictionary<RegionDirectiveSyntax, EndRegionDirectiveSyntax> regionToEndRegion;
     public Dictionary<RegionDirectiveSyntax, SyntaxNode> regionToSyntaxNode;
     public Dictionary<EndRegionDirectiveSyntax, SyntaxNode> endRegionToSyntaxNode;
    
     // Set VisitIntoStructuredTrivia true
     public RegionToEndRegionMatcher(SyntaxTree t) : base(true) {
      tree = t;
      regionStack = new Stack<RegionDirectiveSyntax>();
      regionToEndRegion = new Dictionary<RegionDirectiveSyntax, EndRegionDirectiveSyntax>();
      regionToSyntaxNode = new Dictionary<RegionDirectiveSyntax, SyntaxNode>();
      endRegionToSyntaxNode = new Dictionary<EndRegionDirectiveSyntax, SyntaxNode>();
     }
    
     // Visit methods
     public override void VisitRegionDirective(RegionDirectiveSyntax regionNode) {
      regionStack.Push(regionNode);
      var correspondingSyntaxNode = FindCorrespondingSyntaxNode(regionNode, tree);
      regionToSyntaxNode.Add(regionNode, correspondingSyntaxNode);
     }
    
     public override void VisitEndRegionDirective(EndRegionDirectiveSyntax endRegionNode) {
      var regionNode = regionStack.Pop();
      regionToEndRegion.Add(regionNode, endRegionNode);
      var correspondingSyntaxNode = FindCorrespondingSyntaxNode(endRegionNode, tree);
      endRegionToSyntaxNode.Add(endRegionNode, correspondingSyntaxNode);
     }
    
     // Find corresponding SyntaxNode
     private SyntaxNode FindCorrespondingSyntaxNode(RegionDirectiveSyntax regionDirective, SyntaxTree tree) {
      return SyntaxNodeAfter(regionDirective, tree);
     }
    
     private SyntaxNode FindCorrespondingSyntaxNode(EndRegionDirectiveSyntax endRegionDirective, SyntaxTree tree) {
      SyntaxNode correspondingNode = null;
      // assume #endregion leads a CloseBracketToken
      // TODO: implement this case...
    
      if(correspondingNode == null) {
       // assusme #endregion leads a SyntaxNode
       correspondingNode = SyntaxNodeAfter(endRegionDirective, tree);
      }
      return correspondingNode;
     }
    
    // Don't use this method directly - use FindCorrespondingSyntaxNode instead!
     private SyntaxNode SyntaxNodeAfter(DirectiveSyntax directive, SyntaxTree tree) {
      var allNodes = tree.GetRoot().DescendantNodesAndSelf();
      foreach (var node in allNodes ) {
       if (DirectiveLeadsNode(node, directive)) {
        return FindOuterNode(node);
       }
      }
      return null;
     }
    
     private bool DirectiveLeadsNode(SyntaxNode node, DirectiveSyntax directive) {
      var lead = node.GetLeadingTrivia();
      return lead.Contains(directive.ParentTrivia);
     }
    
     // Travels up the syntax tree as long the start span remains equal
     private SyntaxNode FindOuterNode(SyntaxNode node) {
      var current = node;
      var currentSpanStart = current.Span.Start;
      while ((current.Parent != null) && (current.Parent.Span.Start == currentSpanStart)) {
       current = current.Parent;
      }
      return current;
     }
    }

    • Edited by Jochen Huck Wednesday, September 05, 2012 12:51 PM
    •  
  • Wednesday, September 05, 2012 2:15 PM
     
     Answered Has Code

    Hi Jochen, Here is my way:

        class RegionNodes
        {
            public SyntaxTrivia RegionDirective;
            public SyntaxTrivia EndRegionDirective;
            public TextSpan RegionSpan
            {
                get
                {
                    var start = RegionDirective.Span.Start;
                    var end = EndRegionDirective.Span.Start + EndRegionDirective.Span.Length;
                    return new TextSpan(start, end - start);
                }
            }
            public List<SyntaxNode> Nodes = new List<SyntaxNode>();
            public void AddNode(SyntaxNode node)
            {
                if (RegionSpan.Contains(node.Span))
                    Nodes.Add(node);
            }
        }
        public static void FindRegionContentNodes()
        {
            var tree = SyntaxTree.ParseCompilationUnit(@"
    #region r1
    namespace N
    {
        #region r2
        class C
        {
            #region r3
            void method()
            {
                #region r4
                #region r5
                int a = 0;
                #endregion r5
                #endregion r4
            }
            #endregion r3
        }
        #endregion r2
    }
    #endregion r1
    ");
            var root = tree.GetRoot();
            var regionNodesList = new List<RegionNodes>();
            foreach (var regionDirective in root.DescendantTrivia().Where(i => i.Kind == SyntaxKind.RegionDirective))
                regionNodesList.Add(new RegionNodes { RegionDirective = regionDirective });
            var count = regionNodesList.Count;
            foreach (var endRegionDirective in root.DescendantTrivia().Where(j => j.Kind == SyntaxKind.EndRegionDirective))
                regionNodesList[--count].EndRegionDirective = endRegionDirective;
            foreach (var node in root.DescendantNodes().Where(i => i is MemberDeclarationSyntax || i is StatementSyntax))
                foreach (var regionNodes in regionNodesList)
                    regionNodes.AddNode(node);
        }
    

    • Marked As Answer by Jochen Huck Wednesday, September 05, 2012 3:51 PM
    •  
  • Wednesday, September 05, 2012 3:53 PM
     
     

    Well - that was easy! Thanks alot!

    I was too much into the SyntaxWalker approach and did not even think about using the spans of the #region and #endregion directives...