search within html file


  • BH

    I'm trying to write a code that will search text within an html file.  What is the best way to do that?  Do i have to convert to txt file, or i can somehow search directly the html file?  Please provide sample.

    Thanks so much.


    Thanks Aron

    Monday, August 27, 2012 12:55 AM


All replies

  • Hi,

    Just plz use the DOS's find method to specify the html file and do ur searching:

    ProcessStartInfo info = new ProcessStartInfo("find \"html\"","c:\\name.html /n");
    = false;
    Process p
    = Process.Start(info);
    StreamReader sr
    = p.StandardOutput;
    string s = sr.ReadToEnd();


    Monday, August 27, 2012 2:24 AM
  • The HTML is pure text. So there is no need to convert it to anything else.

    Just put the file into a string. Then you can search it with String.IndexOf() function.


    Noam B.

    Do not Forget to Vote as Answer/Helpful, please. It encourages us to help you...

    • Proposed as answer by Noam B Tuesday, August 28, 2012 3:26 PM
    Monday, August 27, 2012 11:05 AM
  • As said HTML is text so you don't need to convert it to text. If you have simple search you can use methods in String class like IndexOf, but there are many options. You might want to use the HTML tags in document with regular expressions to find what you are looking for; for example to find all <p> tags. If you know the file is xhtml then parsing the XML and searching with XPATH might give the best performance. Or you can use some 3rd party HTML parser, like HTML Agility pack.

    Monday, August 27, 2012 12:33 PM