none
Using regular Expression to find a string between HTML tags

    Question

  • Hey everyone. I have a string that scans an HTML document and comes out with the following string..

    Code Snippet

    "<td>XHTML 1.0 Transitional</td><td><select id=\"doctype\" name=\"doctype\">"


    Im trying to find the string that says "XHTML 1.0 Transitional." Using regular expressions but have been having a heck of a time doing it. I can't seem to get it to work! I have used RegExBuilder and a few other programs to try and find it but to no avail. Could any one else explain how to find the string in between <td> and </td> using regular expressions? It's also important since the string between these two tags could change depending on the website.for example it could be..

    <td>HTML 4.01 Transitional</td>


    I have used them before but these tags in front is confusing me. Could anyone be so kind and tell me the correct regular expression to find this string?
    Sunday, April 13, 2008 7:00 PM

Answers

  • Regex r = new Regex("<td>(.*)</td>");

    Match m = r.Match("dd<td>aaddaann</td>ss");

    MessageBox.Show(m.Groups[1].ToString());

    Sunday, April 13, 2008 8:06 PM

All replies

  • Code Snippet

    Regex r = new Regex("<td>(.*)</td>");

    Match m = r.Match("dd<td>aaddaann</td>ss");

    MessageBox.Show(m.Value);

     

     

    Sunday, April 13, 2008 7:23 PM
  • Thanks Phillip. You've gotten me off to a good start but it's still not formatting correctly. I DO NOT want the <td> or </td> tags. I just want the  string inside of it. Right now I am getting <td>XHTML 4.01 Transitional </td> as a result and I just want the string "XHTML 4.01 Transitional". Any more suggestions?
    Sunday, April 13, 2008 7:43 PM
  • Regex r = new Regex("<td>(.*)</td>");

    Match m = r.Match("dd<td>aaddaann</td>ss");

    MessageBox.Show(m.Groups[1].ToString());

    Sunday, April 13, 2008 8:06 PM