locked
Return Data From HTML Table But Only Certain Fields RRS feed

  • Question

  • User-1056855865 posted

    Hypothetical data below - let's say that my HTML table looks like below, how could I use C# to only return the first two columns of data for each row in the table?

    <h1><span id="UserInfo"></span><span class="mw-headline" id="User Info">Information</span></h1>
    <table>
    <tbody><tr>
    <th>User ID</th>
    <th>User Name</th>
    <th>Phone</th>
    <th>State</th>
    <th>Zip</th></tr>
    <tr>
    <td>abcd</td>
    <td>alpha beta charlie delta</td>
    <td>5555555555</td>
    <td>NY</td>
    <td>00000</td>
    <tr>
    <td>abc</td>
    <td>alpha beta charlie</td>
    <td>1111111111</td>
    <td>NY</td>
    <td>00000</td>
    <tr>
    <td>ab</td>
    <td>alpha beta</td>
    <td>2222222222</td>
    <td>NY</td>
    <td>00000</td>>
    </tbody>
    </table>

    Monday, December 10, 2018 2:43 PM

All replies

  • User475983607 posted

    Hypothetical data below - let's say that my HTML table looks like below, how could I use C# to only return the first two columns of data for each row in the table?

    <h1><span id="UserInfo"></span><span class="mw-headline" id="User Info">Information</span></h1>
    <table>
    <tbody><tr>
    <th>User ID</th>
    <th>User Name</th>
    <th>Phone</th>
    <th>State</th>
    <th>Zip</th></tr>
    <tr>
    <td>abcd</td>
    <td>alpha beta charlie delta</td>
    <td>5555555555</td>
    <td>NY</td>
    <td>00000</td>
    <tr>
    <td>abc</td>
    <td>alpha beta charlie</td>
    <td>1111111111</td>
    <td>NY</td>
    <td>00000</td>
    <tr>
    <td>ab</td>
    <td>alpha beta</td>
    <td>2222222222</td>
    <td>NY</td>
    <td>00000</td>>
    </tbody>
    </table>

    The question is too vague to answer.  You're showing HTML which is client code but asking about C# which runs on a web server.  

    If this is a query question then can you tell us what data access you are using?  Are you using Entity Framework or ADO.NET?  Where does the data come from?

    If you are asking how to render dynamic HTML from the server then we need to know what kind of application you are building; Web Forms, MVC, Razor Pages?

    It is also possible to affect the HTML using JavaScript.

    Monday, December 10, 2018 2:52 PM
  • User-1056855865 posted

    This is how data is displaying on a web page when I view the page source.

    I am wanting to use C# to "query" the page and return only the first two columns rom the table.

    Monday, December 10, 2018 2:57 PM
  • User475983607 posted

    This is how data is displaying on a web page when I view the page source.

    I am wanting to use C# to "query" the page and return only the first two columns rom the table.

    You have not answered any of the clarifying questions so I'm not sure how to provide assistance.  Is there anyway you can show the current C# code?

    Monday, December 10, 2018 3:05 PM
  • User-821857111 posted

    If you want to use C# to parse HTML, you should look at the HtmlAgiltyPack or AngleSharp libraries: 

    Monday, December 10, 2018 4:14 PM
  • User303363814 posted

    Precisely, what is it that you cannot do?  Do you know how to get the text of the page into a c# string?  Do you know how to find elements within the html string?  Is the sample the result of a GET or a POST?  Do you know how to make the request?  Will the real html  have the sorts of syntax errors that your sample shows?  Be prepared for a lot of difficulty if that is the case. What data structure do you want the answer in?  

    Show whatever code you have so far and tell us what result you want.

    I generally use linq-to-xml (Eg Beth Massi article ) but, I believe, the agility pack is far superior at handling mal-formed html like your sample.

    Monday, December 10, 2018 10:17 PM
  • User-893317190 posted

    Hi ManderinViolin12,

    If you want to only show the first columns of the table using c#, you could use xml api.

    Below is my code.

    protected void Page_Load(object sender, EventArgs e)
            {
    //clear all the other content Response.Clear(); XmlDocument file = new XmlDocument(); //load the html file, please path your own path file.Load(Server.MapPath("/FileDemo/table.html"));
    //get the tbody element XmlNode node= file.GetElementsByTagName("tbody")[0]; //loop through all the tr elements foreach ( XmlNode tr in node.ChildNodes) { //remove the last three columns //because every time one element is removed,length of ChildNodes will be reduced by one, so //the index is always 2 instead 2,3,4 tr.RemoveChild(tr.ChildNodes[2]); tr.RemoveChild(tr.ChildNodes[2]); tr.RemoveChild(tr.ChildNodes[2]); } Response.Write(file.InnerXml); //remove other content Response.End(); }

    The result.

    Best regards,

    Ackerly Xu

    Tuesday, December 11, 2018 4:37 AM