Locked SQL simple parser

  • Tuesday, January 16, 2007 6:41 PM
     
     
    Hello
    I'm newbe with reg expressions

    I want to build a simple parser to analize a SQL string. I only need to analize a SQL of type Select and get these groups:
    • Select and From clauses
    • Where clause (if any) without the where reserved word
    • Rest of the sql string (if any)

    I have archived this with this reg expressions::
    string Expr1 = @"(WHERE\s+(.+?))(\s+ORDER BY.+|\s+GROUP BY.+|\s*$)";
    string Expr2 = @"(\s+ORDER BY.+|\s+GROUP BY.+)$";


    I get the select clauses with:
    SelectClause = SQL.Substring(0, m.Groups[1].Index - 1);
    WhereClause = m.Groups[2].Value;
    FinalClauses = m.Groups[3].Value;


    If no match is successfull (no where clause present) I try with Expr2 with similar process.

    But how can i do (if possible) to get work with a subquery like this:
    select f1,f2,f3
    from table1
    where f3 in (select tf1 from table2 group by f4 having f5 > 1 )
    order by f1

    The problem is that the first 'group by' match belongs to the subquery and I need to get the entire where clause.

    thanks in advance
    Miguel

All Replies

  • Tuesday, January 16, 2007 8:16 PM
    Moderator
     
     Answered
    One of the most powerful tools in RegEx is the Lookahead/lookbehind qualifiers (?= and ?<=. I think of them as bookends. I created this expression which will just grab the where clause.


    (?<=where)(?<Where>.*?)(?=order|$)
     


    What it does, not a linear interpolation mind you, is look for text sandwiched between a where and an order or end of text and also, do not match sandwiching items..

    Note I prefer named groups and that is what (?<Where>.*?) does vs just saying (.*?)....because of that in the code I specify Explicit capture which says do not capture any non numbered or named groups.  It allows one to say m.groups["Where"].value.

    Here is the output of the capture, note there is a \n which is not apparent to the display below.


    f3 in (select tf1 from table2 group by f4 having f5 > 1 )

     





    //  using System.Text.RegularExpressions;
    /// <summary>
    ///  Regular expression built for C# on: Tue, Jan 16, 2007, 01:09:13 PM
    ///  Using Expresso Version: 3.0.2559, http://www.ultrapico.com
    /// 
    ///  Capture just the where clause in a single run.
    /// 
    ///  A description of the regular expression:
    /// 
    ///  Match a prefix but exclude it from the capture. [where]
    ///      where
    ///          where
    ///  [Where]: A named capture group. [.*?]
    ///      Any character, any number of repetitions, as few as possible
    ///  Match a suffix but exclude it from the capture. [order|$]
    ///      Select from 2 alternatives
    ///          order
    ///              order
    ///          End of line or string
    /// 
    /// 
    /// </summary>
    public Regex CaptureWhere = new Regex(
        @"(?<=where)(?<Where>.*?)(?=order|$)",
        RegexOptions.IgnoreCase
        | RegexOptions.Multiline
        | RegexOptions.Singleline
        | RegexOptions.ExplicitCapture
        | RegexOptions.CultureInvariant
        | RegexOptions.Compiled
        );