locked
Read specific string from Webpage Source code & Print it in the web form Dot Net RRS feed

  • Question

  • User624488380 posted

    Here Two portion of my own website source code. Now i want to find  "title="View all posts by " from source code and want to show the last all character before the " ends. For an example in 1st code  title="View all posts by Sampgoogly" so here i  want to catch the Sampgoogly word and to show it in my webform. In second Code "title="View all posts by admin"  i want to show the string "admin" which is after view all posts by word and want to show in my dot net web form. How can i do that please help me with code. 

    <span class="posted-on"><a href="https://pickleballconsumer.com/?p=100" title="5:17 pm" rel="bookmark"><time class="updated" datetime="2018-09-24T18:57:50+00:00" itemprop="dateModified">September 24, 2018</time><time class="entry-date published" datetime="2018-09-24T17:17:10+00:00" itemprop="datePublished">September 24, 2018</time></a></span> <span class="byline"><span class="author vcard" itemprop="author" itemtype="https://schema.org/Person" itemscope>by <a class="url fn n" href="https://pickleballconsumer.com/?author=1" title="View all posts by Sampgoogly" rel="author" itemprop="url"><span class="author-name" itemprop="name">Sampgoogly</span></a></span></span> </div><!-- .entry-meta -->

    <header class="entry-header">
    <h2 class="entry-title" itemprop="headline"><a href="https://pickleballconsumer.com/?p=183" rel="bookmark">fdsafdsfsfsfsdf</a></h2> <div class="entry-meta">
    <span class="posted-on"><a href="https://pickleballconsumer.com/?p=183" title="5:09 pm" rel="bookmark"><time class="updated" datetime="2019-03-27T17:09:24+00:00" itemprop="dateModified">March 27, 2019</time><time class="entry-date published" datetime="2019-03-27T17:09:17+00:00" itemprop="datePublished">March 27, 2019</time></a></span> <span class="byline"><span class="author vcard" itemprop="author" itemtype="https://schema.org/Person" itemscope>by <a class="url fn n" href="https://pickleballconsumer.com/?author=2" title="View all posts by admin" rel="author" itemprop="url"><span class="author-name" itemprop="name">admin</span></a></span></span> </div><!-- .entry-meta -->
    Thursday, April 25, 2019 5:32 PM

Answers

  • User288213138 posted

    Hi   samprit Chakraborty,
     
    You can use regular expressions to implement the Read specific string from Webpage Source code.
     
    The code:

    protected void Page_Load(object sender, EventArgs e)
            {
                string input = "<a class=\"url fn n\" href=\"https://pickleballconsumer.com/?author=2\" title=\"View all posts by admin\" rel=\"author\" itemprop=\"url\">";
                Match match = Regex.Match(input, "title=\"([^\"]+)\"", RegexOptions.IgnoreCase);
                if (match.Success)
                {
                    string key = match.Groups[1].Value;
                    int c = key.Split(' ').Count();
                    string b = key.Split(' ')[c - 1];
                    Response.Write(key+"</br>");
                    Response.Write(b);
                                
                }
    
            }
    

    The Result:

    Best Regards,

    Sam

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Friday, April 26, 2019 10:06 AM
  • User-821857111 posted

    You can use Regex, but there are tools designed specifically for parsing HTML. One is HtmlAgilityPack. and another is AngleSharp. Both are available free on Nuget. Here's how you would use AngleSharp:

    var config = Configuration.Default.WithDefaultLoader();
    var address = "https://pickleballconsumer.com/?p=100";
    var document = await BrowsingContext.New(config).OpenAsync(address);
    
    var link = document.QuerySelector("a[title^=\"View all posts by\"]");
    Response.Write(link.TextContent);

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Friday, April 26, 2019 10:55 AM

All replies

  • User288213138 posted

    Hi   samprit Chakraborty,
     
    You can use regular expressions to implement the Read specific string from Webpage Source code.
     
    The code:

    protected void Page_Load(object sender, EventArgs e)
            {
                string input = "<a class=\"url fn n\" href=\"https://pickleballconsumer.com/?author=2\" title=\"View all posts by admin\" rel=\"author\" itemprop=\"url\">";
                Match match = Regex.Match(input, "title=\"([^\"]+)\"", RegexOptions.IgnoreCase);
                if (match.Success)
                {
                    string key = match.Groups[1].Value;
                    int c = key.Split(' ').Count();
                    string b = key.Split(' ')[c - 1];
                    Response.Write(key+"</br>");
                    Response.Write(b);
                                
                }
    
            }
    

    The Result:

    Best Regards,

    Sam

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Friday, April 26, 2019 10:06 AM
  • User-821857111 posted

    You can use Regex, but there are tools designed specifically for parsing HTML. One is HtmlAgilityPack. and another is AngleSharp. Both are available free on Nuget. Here's how you would use AngleSharp:

    var config = Configuration.Default.WithDefaultLoader();
    var address = "https://pickleballconsumer.com/?p=100";
    var document = await BrowsingContext.New(config).OpenAsync(address);
    
    var link = document.QuerySelector("a[title^=\"View all posts by\"]");
    Response.Write(link.TextContent);

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Friday, April 26, 2019 10:55 AM