locked
Scrap Webpage get results and create a table RRS feed

  • Question

  • User-909867351 posted

    HI 

    I want to get some values from tracking website and I use the following code:

     String url = "http://www.cttexpresso.pt/feapl_2/app/open/cttexpresso/objectSearch/objectSearch.jspx?objects=RH306695573PT";        
            var doc = new HtmlAgilityPack.HtmlDocument();
            HtmlAgilityPack.HtmlNode.ElementsFlags["br"] = HtmlAgilityPack.HtmlElementFlag.Empty;
            doc.OptionWriteEmptyNodes = true;        
                var webRequest = HttpWebRequest.Create(url);
                Stream stream = webRequest.GetResponse().GetResponseStream();
                doc.Load(stream);
                stream.Close();                
            string testDivSelector = "//table[@class='full-width']";
            var divString = doc.DocumentNode.SelectSingleNode(testDivSelector).InnerHtml.ToString();
            Response.Write(divString);

    I got the correct result https://registos.programamos.pt/lectt.aspx

    My problem:

    I need to create a table with this results like:

    Hora Estado Motivo Local
    segunda-feira, 7 Janeiro 2019
    15:57 Entregue ANTERO DE QUENTAL (P.DELGADA)
    08:51 Disponível para levantamento ANTERO DE QUENTAL (P.DELGADA)
    sexta-feira, 4 Janeiro 2019
    17:47 Expedição Nacional 9500 PONTA DELGADA

    What's the best option for that?

    Thank you

    Wednesday, January 9, 2019 3:55 PM

Answers

  • User-943250815 posted

    mariolopes,

    Descendants is an IEnumerable, you need the first item.
    So replace

    HtmlNode InnerTbl = OuterTbl.Descendants("table")(0);

    by

    HtmlNode InnerTbl = OuterTbl.Descendants("table").ElementAt(0);

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, January 10, 2019 12:34 PM

All replies

  • User-943250815 posted

    If the nested table can help you, use Descendants. Sample will include Recetor column
    If Receptor should no be part of result, perhaps you can remove last column and ajust colspans, or keep using Descendants to get TR and TD, to get cell values and construct a table like you want.

    If you are working on webform add a Literal control on page

    Dim url As String = "http://www.cttexpresso.pt/feapl_2/app/open/cttexpresso/objectSearch/objectSearch.jspx?objects=RH306695573PT"
        Dim zHTML As New HtmlAgilityPack.HtmlDocument()
        Dim Selector As String = "//table[@class='full-width']"
        Dim webRequest = System.Net.HttpWebRequest.Create(url)
        Dim stream As System.IO.Stream = webRequest.GetResponse().GetResponseStream()
        zHTML.Load(stream)
        stream.Close()
    
        Dim OuterTbl As HtmlNode = zHTML.DocumentNode.SelectSingleNode(Selector)
        Dim InnerTbl As HtmlNode = OuterTbl.Descendants("table")(0)
        Literal1.Text = InnerTbl.OuterHtml

    Wednesday, January 9, 2019 6:57 PM
  • User-909867351 posted

    Hi 

    When I convert it to C# I got one error

     string url = "http://www.cttexpresso.pt/feapl_2/app/open/cttexpresso/objectSearch/objectSearch.jspx?objects=RH306695573PT";
            HtmlAgilityPack.HtmlDocument zHTML = new HtmlAgilityPack.HtmlDocument();
            string Selector = "//table[@class='full-width']";
            var webRequest = System.Net.HttpWebRequest.Create(url);
            System.IO.Stream stream = webRequest.GetResponse().GetResponseStream();
            zHTML.Load(stream);
            stream.Close();
    
            HtmlNode OuterTbl = zHTML.DocumentNode.SelectSingleNode(Selector);
            HtmlNode InnerTbl = OuterTbl.Descendants("table")(0);
            Literal1.Text = InnerTbl.OuterHtml;

    Got error on

    HtmlNode InnerTbl = OuterTbl.Descendants("table")(0);

    Name method expected

    Any help?

    Thursday, January 10, 2019 9:33 AM
  • User-943250815 posted

    Oops bad, I imported (using) HtmlAgilityPack
    Just add HtmlAgilityPack

    Dim url As String = "http://www.cttexpresso.pt/feapl_2/app/open/cttexpresso/objectSearch/objectSearch.jspx?objects=RH306695573PT"
        Dim zHTML As New HtmlAgilityPack.HtmlDocument()
        Dim Selector As String = "//table[@class='full-width']"
        Dim webRequest = System.Net.HttpWebRequest.Create(url)
        Dim stream As System.IO.Stream = webRequest.GetResponse().GetResponseStream()
        zHTML.Load(stream)
        stream.Close()
    
        Dim OuterTbl As HtmlAgilityPack.HtmlNode = zHTML.DocumentNode.SelectSingleNode(Selector)
        Dim InnerTbl As HtmlAgilityPack.HtmlNode = OuterTbl.Descendants("table")(0)
        Literal1.Text = InnerTbl.OuterHtml

    Thursday, January 10, 2019 11:51 AM
  • User-909867351 posted

    Hi

    I have HtmlAgilityPack

    using HtmlAgilityPack;
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Net;
    using System.Web;
    using System.Web.UI;
    using System.Web.UI.WebControls;
    
    public partial class lectt : System.Web.UI.Page
    {
        protected void Page_Load(object sender, EventArgs e)
        {
            String url = "http://www.cttexpresso.pt/feapl_2/app/open/cttexpresso/objectSearch/objectSearch.jspx?objects=RH306695573PT";        
            HtmlAgilityPack.HtmlDocument zHTML = new HtmlAgilityPack.HtmlDocument();
            string Selector = "//table[@class='full-width']";
            var webRequest = System.Net.HttpWebRequest.Create(url);
            System.IO.Stream stream = webRequest.GetResponse().GetResponseStream();
            zHTML.Load(stream);
            stream.Close();
    
            HtmlNode OuterTbl = zHTML.DocumentNode.SelectSingleNode(Selector);
            HtmlNode InnerTbl = OuterTbl.Descendants("table")(0);
            Literal1.Text = InnerTbl.OuterHtml;

    Thursday, January 10, 2019 12:00 PM
  • User-943250815 posted

    mariolopes,

    Descendants is an IEnumerable, you need the first item.
    So replace

    HtmlNode InnerTbl = OuterTbl.Descendants("table")(0);

    by

    HtmlNode InnerTbl = OuterTbl.Descendants("table").ElementAt(0);

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Thursday, January 10, 2019 12:34 PM
  • User-909867351 posted

    Thank you

    Solved my question. I'll work in that project because I want the resultant table be responsive (with bootstrap) and I think I have to create another  (bootstrap) table with the data from this table. I think will be the best option!

    I have to read each row of this table and create another one.

    My problem I have 2 classes with the same name tables full-width I need to get only the last one.

    Any idea?

    Thank you again

    Thursday, January 10, 2019 12:57 PM
  • User-943250815 posted

    As I told, you can keep using Descendants to get Rows and from Rows get Cells.
    The worst part is understand correctly how HTML was constructed, and deal with.
    Same applies to get last table in a single shot, you can query as already done or construct all xpath to get it, just because programmer is not using ID

    To get Rows and Cells:

    For Each Row In InnerTbl.Descendants("tr")
    For Each Cell in Row.Descendants("td") Dim CellValue as string = Cell.InnerText Next
    Next

    Thursday, January 10, 2019 1:29 PM