locked
Extracting value between two string (multiple occurrences) RRS feed

  • Question

  • User-195907812 posted

    Hi,

    I have a string that can contain multiple occurrences of strings between start/end tags:

    "example text here <mytag>hello</mytag> more text <mytag>goodbye</mytag> more text <mytag>back again</mytag> example text"

    I'm sure there is an easy way to read through all the tags and extract the content, can anyone help?

    Many thanks

    Wednesday, August 19, 2020 4:07 PM

Answers

  • User2103319870 posted

    RageRiot

    there is an easy way to read through all the tags and extract the content

    If you are ok with using ThirdParty library You can use the use the HtmlAgilityPack like given below

    sample code

    var doc = new HtmlDocument();
                doc.LoadHtml(@"example text here <mytag> hello </mytag> more text <mytag> goodbye </mytag> more text <mytag> back again </mytag> example text");
                //Get all mytag nodes
                HtmlNodeCollection col = doc.DocumentNode.SelectNodes("//mytag");
                //Grab the content inside each node
                foreach (HtmlNode node in col)
                {
                    string data = node.InnerText;
                }

    Ensure that you have added the below namespace after add in the dll to your project

    using HtmlAgilityPack;

    Another option is to use the Regex to extract the string in between of body tags like given below

    // Populate the html string here from database
                string html = @"example text here <mytag> hello </mytag> more text <mytag> goodbye </mytag> more text <mytag> back again </mytag> example text";
                string theBody = string.Empty;
    
                //Regex Options
                RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.Singleline;
    
                //Get the matching string
                foreach (Match m in Regex.Matches(html, "<mytag>(.*?)</mytag>", options))
                {
                    // Output your values here
                    Console.WriteLine(m.Groups[1].Value);
                }
    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, August 19, 2020 5:41 PM

All replies

  • User2103319870 posted

    RageRiot

    there is an easy way to read through all the tags and extract the content

    If you are ok with using ThirdParty library You can use the use the HtmlAgilityPack like given below

    sample code

    var doc = new HtmlDocument();
                doc.LoadHtml(@"example text here <mytag> hello </mytag> more text <mytag> goodbye </mytag> more text <mytag> back again </mytag> example text");
                //Get all mytag nodes
                HtmlNodeCollection col = doc.DocumentNode.SelectNodes("//mytag");
                //Grab the content inside each node
                foreach (HtmlNode node in col)
                {
                    string data = node.InnerText;
                }

    Ensure that you have added the below namespace after add in the dll to your project

    using HtmlAgilityPack;

    Another option is to use the Regex to extract the string in between of body tags like given below

    // Populate the html string here from database
                string html = @"example text here <mytag> hello </mytag> more text <mytag> goodbye </mytag> more text <mytag> back again </mytag> example text";
                string theBody = string.Empty;
    
                //Regex Options
                RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.Singleline;
    
                //Get the matching string
                foreach (Match m in Regex.Matches(html, "<mytag>(.*?)</mytag>", options))
                {
                    // Output your values here
                    Console.WriteLine(m.Groups[1].Value);
                }
    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, August 19, 2020 5:41 PM
  • User-195907812 posted

    Thank you A2H, I'll check out that library but the regex looks like it'll work well for this.

    Wednesday, August 19, 2020 7:26 PM