none
How to get the compiled regular expression using Regex?

    Question

  • Hi,

    I am working on tree pattern matching using C on windows platform. The best program which I found is tgrep2 which is only available for Linux and I'm trying to replace linux regex::regcomp(), regexec() and regerror() functions with windows regex functions.

    in these linux functions a data structure type of re_pattern_buffer is used which has a parameter: char* buffer that holds the compiled regular expression after calling regcom(). However I can't understand if there is any equivalent field in windows tr1::regex.

    the struct of pattern_buffer is as follows:

    struct re_pattern_buffer
    {
    /* [[[begin pattern_buffer]]] */
        /* Space that holds the compiled pattern.  It is declared as
              `unsigned char *' because its elements are
               sometimes used as array indexes.  */
      unsigned char *buffer;

        /* Number of bytes to which `buffer' points.  */
      unsigned long allocated;

        /* Number of bytes actually used in `buffer'.  */
      unsigned long used;   

            /* Syntax setting with which the pattern was compiled.  */
      reg_syntax_t syntax;

            /* Pointer to a fastmap, if any, otherwise zero.  re_search uses
               the fastmap, if there is one, to skip over impossible
               starting points for matches.  */
      char *fastmap;

            /* Either a translate table to apply to all characters before
               comparing them, or zero for no translation.  The translation
               is applied to a pattern when it is compiled and to a string
               when it is matched.  */
      char *translate;

        /* Number of subexpressions found by the compiler.  */
      size_t re_nsub;

            /* Zero if this pattern cannot match the empty string, one else.
               Well, in truth it's used only in `re_search_2', to see
               whether or not we should use the fastmap, so we don't set
               this absolutely perfectly; see `re_compile_fastmap' (the
               `duplicate' case).  */
      unsigned can_be_null : 1;

            /* If REGS_UNALLOCATED, allocate space in the `regs' structure
                 for `max (RE_NREGS, re_nsub + 1)' groups.
               If REGS_REALLOCATE, reallocate space if necessary.
               If REGS_FIXED, use what's there.  */
    #define REGS_UNALLOCATED 0
    #define REGS_REALLOCATE 1
    #define REGS_FIXED 2
      unsigned regs_allocated : 2;

            /* Set to zero when `regex_compile' compiles a pattern; set to one
               by `re_compile_fastmap' if it updates the fastmap.  */
      unsigned fastmap_accurate : 1;

            /* If set, `re_match_2' does not return information about
               subexpressions.  */
      unsigned no_sub : 1;

            /* If set, a beginning-of-line anchor doesn't match at the
               beginning of the string.  */
      unsigned not_bol : 1;

            /* Similarly for an end-of-line anchor.  */
      unsigned not_eol : 1;

            /* If true, an anchor at a newline matches.  */
      unsigned newline_anchor : 1;

    /* [[[end pattern_buffer]]] */
    };

    Any suggestion? Thanks for the help!

    Sunday, October 30, 2011 6:30 AM

Answers

  • It seems that the regular expression is compiled (parsed and transformed into nodes) inside the constructor. Probably there is no equivalent for re_pattern_buffer and you do not have to compile the expressions explicitly.

    • Marked as answer by Rob Pan Friday, November 04, 2011 8:51 AM
    Sunday, October 30, 2011 11:33 AM
  • SaraMansouri wrote:

    I am working on tree pattern matching using C on windows platform. The  best program which I found is tgrep2 which is only
    available for Linux and I'm trying to replace linux regex::regcomp(),  regexec() and regerror() functions with windows regex
    functions. 

    in these linux functions a data structure type of re_pattern_buffer is  used which has a parameter: char* buffer that holds the
    compiled regular expression after calling regcom(). However I can't  understand if there is any equivalent field in windows
    tr1::regex. 

    An instance of regex class is essentially a compiled regular expression.  regcomp et al are C functions and cannot maintain state, so they have to  rely on the caller to do it for them. regex class, on the other hand, is  a C++ class with its own state, so it just stores the compiled regular  expression internally.


    Igor Tandetnik

    • Marked as answer by Rob Pan Friday, November 04, 2011 8:51 AM
    Sunday, October 30, 2011 2:30 PM

All replies

  • It seems that the regular expression is compiled (parsed and transformed into nodes) inside the constructor. Probably there is no equivalent for re_pattern_buffer and you do not have to compile the expressions explicitly.

    • Marked as answer by Rob Pan Friday, November 04, 2011 8:51 AM
    Sunday, October 30, 2011 11:33 AM
  • SaraMansouri wrote:

    I am working on tree pattern matching using C on windows platform. The  best program which I found is tgrep2 which is only
    available for Linux and I'm trying to replace linux regex::regcomp(),  regexec() and regerror() functions with windows regex
    functions. 

    in these linux functions a data structure type of re_pattern_buffer is  used which has a parameter: char* buffer that holds the
    compiled regular expression after calling regcom(). However I can't  understand if there is any equivalent field in windows
    tr1::regex. 

    An instance of regex class is essentially a compiled regular expression.  regcomp et al are C functions and cannot maintain state, so they have to  rely on the caller to do it for them. regex class, on the other hand, is  a C++ class with its own state, so it just stores the compiled regular  expression internally.


    Igor Tandetnik

    • Marked as answer by Rob Pan Friday, November 04, 2011 8:51 AM
    Sunday, October 30, 2011 2:30 PM