locked
Simple regex to parse files headers. RRS feed

  • Question

  • User-1668256174 posted

    Hello everybody,

    I need a regex that would help me to parse files headers.

    Here is an example :

    /* =================================================================================*/
    /*    Creation date            :  09/01/2013      */
    /*    Writer               :  Jean Dupont     */
    /*    Procedure goal        :  Ceci est un exemple de fichier qui permet de   */
    /*      :  stocker des calculs mathématiques.          */
    /*         Ces calculs proviennent des serveurs du groupe */
    /*      :  C1-03 qui est = au groupe C2-02 .  */
    /*      :  Si ces calculs sont = à 0 alors ne pas en tenir  */
    /*      :  compte.  */
    /* =================================================================================*/
    /*    Paramètre reçu  1           :       */
    /* ---------------------------------------------------------------------------------*/
    /*    Paramètre retourné  1       :                                                 */
    /* =================================================================================*/
    /*    Date de modification :                                                        */
    /*    Rédacteur                 :                                                 */
    /*    But modification            :                                                 */
    /* =================================================================================*/

    I would like to retrieve with a regex, the part "Procedure goal" :

    Ceci est un exemple de fichier qui permet de   */
    /*      :  stocker des calculs mathématiques.          */
    /*         Ces calculs proviennent des serveurs du groupe */
    /*      :  C1-03 qui est = au groupe C2-02 .  */
    /*      :  Si ces calculs sont = à 0 alors ne pas en tenir  */
    /*      :  compte.  */
    

     

    This part may end either by this line :

    /* ============================================*/

    either by this line :

    /*----------------------------------------------------------------------*/

    (the number of '=' or '-' is indifferent but always > 2)

    It is important to test the presence of the '/*' and the '*/'.

    And the number of lines is variable.

    I made a regex, but it is too difficult and I don't get what I need.

    '\/\*\s*(?i)procedure\s*goal\s*\:\s*(.*\s*\*\/(\r*\n*\/\*\s*\:*\s*[^\=]{2,}[^\=\-]{2,}\s*\*\/))(?!\/\*\s*\={2,}\s*\-{2,}\s*\*\/)'
    

    Thanks a lot for your help.

    Eric.

    Thursday, September 5, 2013 5:22 PM

Answers

  • User1508394307 posted

    The simplest pattern is 

    Procedure\s+goal\s+:((.|\n)*?)(===|---)

    or

    Procedure\s+goal\s+:((.|\n)*?)(/\*\s*[=-]*\s*\*/)

    where you can capture the required text using

    Regex regex = new Regex(
          "Procedure\\s+goal\\s+:((.|\\n)*?)(===|---)",
          RegexOptions.None);
    Match m = regex.Match(InputText);
    string text = m.Groups[1].Value;

    There are many variants to improve the pattern depends on requirements,

    e.g. if you need to match "Procedure goal" only at the beginning of the string you can add /*s+

    /*\s+Procedure\s+goal...

    If you don't want to use Match.Groups you can do

    (?<=Procedure\s+goal\s+:)(.|\n)*?(?=(/\*\s*[=-]*\s*\*/))

    etc.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, September 5, 2013 5:54 PM

All replies

  • User1508394307 posted

    The simplest pattern is 

    Procedure\s+goal\s+:((.|\n)*?)(===|---)

    or

    Procedure\s+goal\s+:((.|\n)*?)(/\*\s*[=-]*\s*\*/)

    where you can capture the required text using

    Regex regex = new Regex(
          "Procedure\\s+goal\\s+:((.|\\n)*?)(===|---)",
          RegexOptions.None);
    Match m = regex.Match(InputText);
    string text = m.Groups[1].Value;

    There are many variants to improve the pattern depends on requirements,

    e.g. if you need to match "Procedure goal" only at the beginning of the string you can add /*s+

    /*\s+Procedure\s+goal...

    If you don't want to use Match.Groups you can do

    (?<=Procedure\s+goal\s+:)(.|\n)*?(?=(/\*\s*[=-]*\s*\*/))

    etc.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, September 5, 2013 5:54 PM
  • User-1668256174 posted

    Thank you very much smirnov !

    I will try this tomorrow at my office (in about 10 hours).

    And I will give you feed back.

    Regards.

    Eric.

    Thursday, September 5, 2013 6:11 PM
  • User-1668256174 posted

    smirnov,

    the regex works very fine, and I thank you once again.

    Is there a way to retrieve only the part "procedure goal" (as you show me) but without the \n, \r, /* , */ and the first ':' of each line ?

    Eric.

     

     

     

    Friday, September 6, 2013 3:49 AM
  • User1508394307 posted

    Well, I think it must be possible to adjust the pattern to do this, but it would too complex and might not cover all possible cases. I would recommend to run second regex, or regex.replace to "clean" the first match.

    Regex regex = new Regex(@"Procedure\sgoal\s+:(.*?)/*===", RegexOptions.Singleline);
    
    Match m = regex.Match(InputText);
    string text = m.Groups[1].Value;
    
    text = Regex.Replace(text, @"\*/\r\n/\*\s+:?", "");

    Hope this helps.

    Friday, September 6, 2013 4:49 AM
  • User-1668256174 posted

    Ok smirnov,

    Thank you.

    You have saved me a lot of time.

    Eric

     

     

    Friday, September 6, 2013 5:26 AM