How can I find a string after a specific string/character using regex RRS feed

  • Question

  • This is what I have that was working for wile..

       using (WebClient client = new WebClient())
                    client.CachePolicy = new System.Net.Cache.RequestCachePolicy(System.Net.Cache.RequestCacheLevel.NoCacheNoStore);
                            byte[] store = client.DownloadData(StateWebLocation);
                            string Data = System.Text.Encoding.UTF8.GetString(store);

                            pattern = @",?\d,\d,\d\s*</div>";
                            string Data2 = Regex.Replace(Data, @"\s", "");
                            foreach (Match m in Regex.Matches(Data2, pattern, RegexOptions.Singleline))
                                string t = m.ToString();
                                if ("," != t.Substring(0, 1))
                                    t = Regex.Replace(t, "[^.0-9]", "");

    The problem is that the web site format has changed.. to this.......

          <a class="c-table-group__model-header" href="/at/laptop">
            Model 1
                      <td class="c-table__td c-table-collapsible__td c-table-group__result u-order-3@tablet u-flex-5" data-link=/at/laptop data-header="Result">
    <div class="c-model-group js-model-group ">
      <div class="c-model-group__numbers">
      <span class="c-model js-model c-model--outline">
      <span class="c-model js-model c-model--outline">
      <span class="c-model js-model c-model--outline">

    What I'm trying to do is after the key word search in data " Model 1" and "Model 2"

    look for this grouping of 3 numbers and merge to one string "It's just on model number broke up.." Model 2 has 4 numbers.

    and add it to myList. Some time there are 2 different sets of numbers... I need both if there is another one.

    little lost on how to grab the data the correct way.. any ideas ?



    Wednesday, December 5, 2018 7:21 PM

All replies

  • Never used XML Processing or use the search operators that what i have in the sample to achieve my goal.. If I see sample of code i get the idea of how the functions work.
    • Proposed as answer by Lincoln_MA Wednesday, December 5, 2018 10:37 PM
    • Unproposed as answer by Lincoln_MA Wednesday, December 5, 2018 10:37 PM
    Wednesday, December 5, 2018 7:41 PM
  • @Andrew B. Painter

    The problem is that HTML is not Regular the way that RegularExpressions want input to be.  You want to use an XML Processing engine (such as System.Data.XML) and the node/attribute/value search functions there.

    It never hurts to try. In the worst-case scenario, you'll learn something.

    Who told you, MR BS, Regex is not appropriate to use with HTML (or XML) data? Certainly, your ignorance about Regex is your adviser, and you want to convince others to not use it, making you less envy...

    P.S.: if you don't like the way I treat you, ask the manager of these mvps to ban my account; if you don't succeed, this means my treatment to you, is correct! 

    Saturday, December 8, 2018 6:53 AM
  • @Andrew B. Painter

    How about if you educate me and everyone else on MSDN by providing a RegEx pattern that extracts the 3rd of 5 second-innermost sibling elements from a div element which is the child of a body element which is the sibling of a head element and the child of an html element, regardless of the target's tag type?

    MR BS,

    It's easy, after the pattern is determined, dude ssa eloh.

    Saturday, December 8, 2018 7:12 AM
  • If you are extracting data from a website use Selenium.  Selenium is built for extracting and testing websites.  I've used it for several complex projects.  I has a powerful extraction capability with XPath.  C# supports Selenium and has drivers for the different browsers.  If you provide the website I might have time to do an example.  Here is a tutorial.
    Sunday, December 9, 2018 1:41 AM
  • I'm sure that it would work.. but that's a lot for a small project.

    Sunday, December 9, 2018 4:14 AM
  • I figured out something that worked.. 


    Sunday, December 9, 2018 4:16 AM