Lexical Analysis using GPLEX
-
16 марта 2007 г. 13:45Модератор
Jackson McCann wrote: ...but have you considered a lexical analysis approach? GPLEX is a c# lexer and can be found here http://plas.fit.qut.edu.au/gplex/ and it might provide an alternative approach.
I like that link. Have you actually used the Aussie's product or the Managed Package Lex? It looks interesting. I am going to add the link to the .Net Regex Resources reference.
Thanks
Все ответы
-
16 марта 2007 г. 15:16I've used it once - to create a simple reader for a flat file that contains formatted meta data about BEA Tuxedo field definitions, took about half a day to get up to speed with the app, integrate it into VS2005 and create the .lex file.
-
16 марта 2007 г. 15:47
Well here is my simple example - took about an hour to create. Starting with some very simple html, stripping out everything between the body tags isn't covered
This is some text in <b>bold</b> and <i>italics</i>
<ul>
<li>one</li>
<li><b>two</b></li>
<li>Three</li>
</ul>
<ol>
<li>four</li>
<li>five</li>
<li>six</li>
</ol>Here is the code that GPLEX turns into a C# program.
/*
This is a simple parser for html. It tries to turn the html into something that
can be displayed by a wiki.
*/%namespace LexScanner
%option noparser%x DOTLIST
%x NUMLISTbl [<]
br [>]
bls [<][/]
any [^<>]%%
\n|\r\n? { Console.WriteLine() ; /* End of a line */ }<*>{bl}b{br} { Console.Write("'''") ; /* Three single quotes switches on bolding */ }
<*>{bls}b{br} { Console.Write("'''") ; /* Three single quotes switches off bolding */ }<*>{bl}i{br} { Console.Write("'") ; /* A single quote switches on italics */ }
<*>{bls}i{br} { Console.Write("'") ; /* A single quote switches of italics */ }<INITIAL>{bl}ul{br} { BEGIN(DOTLIST) ; /* Detect the start of simple list */ }
<DOTLIST>{bls}ul{br} { BEGIN(INITIAL) ; /* Detect the end of the list */ }
<DOTLIST>{bl}li{br} { Console.Write(" * ") ; /* Wiki code for a bulleted list item */ }
<DOTLIST>{bls}li{br} { /* Ignore the end of list item tag */ }<INITIAL>{bl}ol{br} { BEGIN(NUMLIST) ; /* Detect the start of numbered list */ }
<NUMLIST>{bls}ol{br} { BEGIN(INITIAL) ; /* Detect the end of the list */ }
<NUMLIST>{bl}li{br} { Console.Write(" 1 ") ; /* Wiki code for a numbered list item */ }
<NUMLIST>{bls}li{br} { /* Ignore the end of list item tag */ }<*>{any} { Console.Write(yytext) ; }
%%
public static void Main(string[] argp) {
if (argp.Length == 0)
Console.WriteLine("Usage: WordCount filename(s)");
for (int idx = 0; idx < argp.Length; idx++) {
string name = argp[idx];
try {
int tok;
FileStream file = new FileStream(name, FileMode.Open);
Scanner scnr = new Scanner(file);
Console.WriteLine("File: " + name);
do {
tok = scnr.yylex();
} while (tok > (int)Tokens.EOF);
} catch (IOException) {
Console.WriteLine("File " + name + " not found");
}
}
}It isn't hard to get it to create a .dll that you can use from a program. I create a c# console app, add the code below as htmllexer.lex and then put a pre-build event into the project to create a .cs file from the .lex file.
gplex /minimize /summary /out:$(ProjectDir)/htmllexer.cs $(ProjectDir)/htmllexer.lex
And when I run the above on my test file I get:
This is some text in '''bold''' and 'italics'
* one
* '''two'''
* Three
1 four
1 five
1 sixI had to make some edits to stop the code turning into smileys - so apologies if I've introduced a typo.
-
16 марта 2007 г. 16:00
Jackson,
thank you for your time explaining this; it helps.

