Answered by:
Simple regex to parse files headers.

Question
-
User-1668256174 posted
Hello everybody,
I need a regex that would help me to parse files headers.
Here is an example :
/* =================================================================================*/ /* Creation date : 09/01/2013 */ /* Writer : Jean Dupont */ /* Procedure goal : Ceci est un exemple de fichier qui permet de */ /* : stocker des calculs mathématiques. */ /* Ces calculs proviennent des serveurs du groupe */ /* : C1-03 qui est = au groupe C2-02 . */ /* : Si ces calculs sont = à 0 alors ne pas en tenir */ /* : compte. */ /* =================================================================================*/ /* Paramètre reçu 1 : */ /* ---------------------------------------------------------------------------------*/ /* Paramètre retourné 1 : */ /* =================================================================================*/ /* Date de modification : */ /* Rédacteur : */ /* But modification : */ /* =================================================================================*/
I would like to retrieve with a regex, the part "Procedure goal" :
Ceci est un exemple de fichier qui permet de */ /* : stocker des calculs mathématiques. */ /* Ces calculs proviennent des serveurs du groupe */ /* : C1-03 qui est = au groupe C2-02 . */ /* : Si ces calculs sont = à 0 alors ne pas en tenir */ /* : compte. */
This part may end either by this line :
/* ============================================*/
either by this line :
/*----------------------------------------------------------------------*/
(the number of '=' or '-' is indifferent but always > 2)
It is important to test the presence of the '/*' and the '*/'.
And the number of lines is variable.
I made a regex, but it is too difficult and I don't get what I need.
'\/\*\s*(?i)procedure\s*goal\s*\:\s*(.*\s*\*\/(\r*\n*\/\*\s*\:*\s*[^\=]{2,}[^\=\-]{2,}\s*\*\/))(?!\/\*\s*\={2,}\s*\-{2,}\s*\*\/)'
Thanks a lot for your help.
Eric.
Thursday, September 5, 2013 5:22 PM
Answers
-
User1508394307 posted
The simplest pattern is
Procedure\s+goal\s+:((.|\n)*?)(===|---)
or
Procedure\s+goal\s+:((.|\n)*?)(/\*\s*[=-]*\s*\*/)
where you can capture the required text using
Regex regex = new Regex( "Procedure\\s+goal\\s+:((.|\\n)*?)(===|---)", RegexOptions.None); Match m = regex.Match(InputText); string text = m.Groups[1].Value;
There are many variants to improve the pattern depends on requirements,
e.g. if you need to match "Procedure goal" only at the beginning of the string you can add /*s+
/*\s+Procedure\s+goal...
If you don't want to use Match.Groups you can do
(?<=Procedure\s+goal\s+:)(.|\n)*?(?=(/\*\s*[=-]*\s*\*/))
etc.
- Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
Thursday, September 5, 2013 5:54 PM
All replies
-
User1508394307 posted
The simplest pattern is
Procedure\s+goal\s+:((.|\n)*?)(===|---)
or
Procedure\s+goal\s+:((.|\n)*?)(/\*\s*[=-]*\s*\*/)
where you can capture the required text using
Regex regex = new Regex( "Procedure\\s+goal\\s+:((.|\\n)*?)(===|---)", RegexOptions.None); Match m = regex.Match(InputText); string text = m.Groups[1].Value;
There are many variants to improve the pattern depends on requirements,
e.g. if you need to match "Procedure goal" only at the beginning of the string you can add /*s+
/*\s+Procedure\s+goal...
If you don't want to use Match.Groups you can do
(?<=Procedure\s+goal\s+:)(.|\n)*?(?=(/\*\s*[=-]*\s*\*/))
etc.
- Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
Thursday, September 5, 2013 5:54 PM -
User-1668256174 posted
Thank you very much smirnov !
I will try this tomorrow at my office (in about 10 hours).
And I will give you feed back.
Regards.
Eric.
Thursday, September 5, 2013 6:11 PM -
User-1668256174 posted
smirnov,
the regex works very fine, and I thank you once again.
Is there a way to retrieve only the part "procedure goal" (as you show me) but without the \n, \r, /* , */ and the first ':' of each line ?
Eric.
Friday, September 6, 2013 3:49 AM -
User1508394307 posted
Well, I think it must be possible to adjust the pattern to do this, but it would too complex and might not cover all possible cases. I would recommend to run second regex, or regex.replace to "clean" the first match.
Regex regex = new Regex(@"Procedure\sgoal\s+:(.*?)/*===", RegexOptions.Singleline); Match m = regex.Match(InputText); string text = m.Groups[1].Value; text = Regex.Replace(text, @"\*/\r\n/\*\s+:?", "");
Hope this helps.
Friday, September 6, 2013 4:49 AM -
User-1668256174 posted
Ok smirnov,
Thank you.
You have saved me a lot of time.
Eric
Friday, September 6, 2013 5:26 AM