none
Getting information on specific fields of length N in a html file using xpath And placing them in some textbox vb.net

    Question

  • Hello my dear friends at microsoft.com I am a novice vb.net programmer I'm working on a project that takes a set of data from a series Html file with the same format but with different values ​​I put in some textbox I already used the search function and file operations I do it But because the values ​​are different length in some cases I was having Trouble! After some searching on the internet site and stackoverflow. I noticed there "xpath" But I do not know how Getting information on specific fields of length N in a html file using xpath my html file in this Format:http://www.nippyzip.com/uploads/130828030127-38363.zip

    I need to have field values ​​that I showed with word "this number" or "this Field" in my html file.zip i need get value of The fields with xpath addres(i i use firebug or inspect element in choreme) and put them to some text box in vb.net tanx a a lot.

    for exp see this code
    Imports System.Data
    Imports System.Net
    Imports System.Collections
    Imports System.Collections.Specialized
    Imports System.Text
    Imports System.Text.RegularExpressions
    Imports HtmlAgilityPack
    Public Class Form1
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            Dim doc = New HtmlAgilityPack.HtmlDocument
            Dim strAnswer As String = ""

            doc.LoadHtml("C:\Users\T3AS0FT\Desktop\my_html_file\25696.htm")
            'this number (1)
            Dim a As HtmlAgilityPack.HtmlNode =
              doc.DocumentNode.SelectSingleNode("/html/body/center/table/tbody/tr[4]/td/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[2]/tbody/tr/td[2]/table/tbody/tr[2]/td[8]/strong")
            RichTextBox1.Text = a.InnerText.ToString ''but dont work!!!!

            'this filed(1)
            Dim b As HtmlAgilityPack.HtmlNode =
          doc.DocumentNode.SelectSingleNode("/html/body/center/table/tbody/tr[4]/td/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[2]/tbody/tr/td[2]/table/tbody/tr[2]/td[5]/p[2]")
            RichTextBox2.Text = b.InnerText.ToString ''but dont work!!!!


            'this number(2)
            Dim c As HtmlAgilityPack.HtmlNode =
          doc.DocumentNode.SelectSingleNode("/html/body/center/table/tbody/tr[4]/td/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[2]/tbody/tr/td[2]/table/tbody/tr[2]/td[7]/p")
            RichTextBox3.Text = c.InnerText.ToString ''but dont work!!!!
            ' and ...
            ' i use firebug 
            'this code dont work plz help me
        End Sub
    End Class



    • Edited by majidgh Thursday, August 29, 2013 1:56 AM
    Thursday, August 29, 2013 1:55 AM

All replies

  • Try getting the inner text of the body

    a.Body.InnerText.ToString


    jdweng

    Thursday, August 29, 2013 4:57 AM
  • it's not work!! a not have property body
    Thursday, August 29, 2013 7:56 AM
  • Do you have a valid html document?  No body???


    jdweng

    Thursday, August 29, 2013 3:05 PM
  • plz see it!

    http://uploadtak.com/images/r8938_Capture.jpg

    vb.net does not find body property!

    Thursday, August 29, 2013 8:11 PM
  • You are using a 3rd part dll that I don't know anything about.  Maybe you should use a Microsoft library below like the code below

    'add reference from com tab Microsoft HTML Object Library
    Imports mshtml
    Module Module1
        Sub Main()
            Dim doc As HTMLDocument = New HTMLDocument
            doc.body.innerText
        End Sub
    End Module


    jdweng

    Thursday, August 29, 2013 9:50 PM
  • your code return all tag in my html file but i need

    "Getting information on specific fields of length N in a html file"

     
    Thursday, August 29, 2013 10:30 PM
  • Great.  You now have a few choices

    1)  You can enumerate through the doc.all.innertext

    2) You can seatch the doc for tags using

           doc.GetElementsByTag()

    3)  You can search the doc fir ID

           doc.GetElementByID()

    4) You can parse the tags using RegEx


    jdweng

    Thursday, August 29, 2013 10:50 PM
  • Tnx a lot but do you can show me one of these choice in my html file http://www.nippyzip.com/uploads/130828030127-38363.zip i need value of table cells refer to them by word "this" Plz show me with getelementbytagname() tnx a lot my teacher

    • Edited by majidgh Thursday, August 29, 2013 11:17 PM
    Thursday, August 29, 2013 11:14 PM
  • Scraping a webpage like this is very difficult.  The data isn't in any pattern.  Some items are in tables and other aren't.  If the data was in table(s), it is easy to get all the rows and columns of a table, or to get specific rows and/or columns from a table.  Also searching the webpage and getting elments that have "id=" is very easy.  I would recommend that you add "id=" to the elements that you need to scrape.  Then use the method "doc.GetElementByID()".

    jdweng

    Friday, August 30, 2013 9:32 AM