locked
How to do WebScraping with ASP.NET Core ? RRS feed

  • Question

  • User-1370514677 posted

    Hi everyone, I'd like to know if there is a built-in way to do webscraping (like fetching basic data from a website and use that data on my ASP.NET Core website) with ASP.NET Core ? Because from what I've understood webscraping basically works like this :

    1. Fetch the HTML page

    2. Analyse the HTML

    3. Get the data that matches classnames, divs or whatever you've specified

    Best Regards

    Wednesday, March 17, 2021 9:04 PM

Answers

All replies

  • User475983607 posted

    I'd like to know if there is a built-in way to do webscraping (like fetching basic data from a website and use that data on my ASP.NET Core website) with ASP.NET Core ?

    This question is too vague to answer.  .NET 5 (Core) has many APIs that can accomplish this task.  HttpCleint for fetching HTML via HTTP.  XML libraries for querying the data.  Plus you can search NuGet for 3rd party libraries.

    The HTML Agility Pack is a common web scaping API that many forum members recommend.

    https://html-agility-pack.net/

    Wednesday, March 17, 2021 9:30 PM
  • User1686398519 posted

    Hi valenciano8, 

    I found a tutorial to automated web scraping and data extraction using HTTP requests and web browsers, you can refer to it.

    This tutorial provides two ways to fetch and crawl data in the following ways:

    1. basic HTTP requests and
    2. web browsers—as well as the pros and cons of each.

    The tutorial also provides example, you can click the link above to view.

    Best Regards,

    YihuiSun

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, March 18, 2021 7:20 AM
  • User-821857111 posted

    I use AngleSharp, which enables you to query the HTML using standard CSS selectors. For example, to get the h1 content for this page, you would do this:

    var config = AngleSharp.Configuration.Default.WithDefaultLoader();
    var address = "https://forums.asp.net/t/2175143.aspx?How+to+do+WebScraping+with+ASP+NET+Core+";
    var document = await BrowsingContext.New(config).OpenAsync(address);
    var heading = document.QuerySelector("h1#threadstatus");
    Console.WriteLine(heading.TextContent);

    https://github.com/AngleSharp/AngleSharp

    Thursday, March 18, 2021 7:24 AM