none
Html Regular expresions - C# .NET pattern RRS feed

  • Question

  • Hey Guys i am trying to get some information of a website put i am not able to pull it off. I downloaded the html file and I am using regular expressions to look for patterns in the html file.  And i think i know how to specify between the tags <h3 class="title"> and  </a></h3> but i don't know how how i should put the <a href="actueel/nieuws/open-lesmiddag-31-januari-2018"> in the pattern because every time it appears on the page the link is different.  How do change my pattern so it has a match on every one of these 3 lines?  I really hope someone can help me. I looked everywhere else on the forums but no situation is the same.

    string pattern = "<h3> class=\"title\">(.+?)</a></h3>"

     

    <h3 class="title"><a href="actueel/nieuws/open-lesmiddag-31-januari-2018">Text i Want</a></h3>

    <h3 class="title"><a href="actueel/nieuws/vrije-dag-20-februari-2020">Text i Want</a></h3>

    <h3 class="title"><a href="actueel/nieuws/plezier-10-mei-2019">Text i Want</a></h3>

    Wednesday, February 21, 2018 8:35 PM

Answers

  • Try one of the simple solutions:

       <h3\s+class=\"title\">\s*<a\s+.*?>(.+?)</a>\s*</h3>

    It can be adjusted if you want to filter by some particular href values.

    The problem can be also solved using some HTML parsers, which will give you a tree of HTML nodes.

    • Marked as answer by Coolcolumbus Thursday, February 22, 2018 11:05 AM
    Thursday, February 22, 2018 6:09 AM

All replies

  • Try one of the simple solutions:

       <h3\s+class=\"title\">\s*<a\s+.*?>(.+?)</a>\s*</h3>

    It can be adjusted if you want to filter by some particular href values.

    The problem can be also solved using some HTML parsers, which will give you a tree of HTML nodes.

    • Marked as answer by Coolcolumbus Thursday, February 22, 2018 11:05 AM
    Thursday, February 22, 2018 6:09 AM
  • Thanks dude! It works i did have to put two backslashes before the "s". 
    Thursday, February 22, 2018 11:05 AM