locked
Retrieving data from html table RRS feed

  • Question

  • Hi all,

    Hi, I am trying to make a program which retrieves information from a table on a certain homepage.But the problem is, how do I go about retrieving the tables and sort their information by columns? Any suggestions, or hints to get started?

    Thank you in advance

    What I have so far is basicly

    public SSContentsData fromHTML(string strFilePath)
      {
       SSContentsData data = null;

       try {
        data = new SSContentsData();

       } catch(Exception ex) {
        Debug.Assert(false, ex.Message);
        return null;
       }

                Debug.WriteLine("ConvertFile  start = " + System.DateTime.Now.ToString("h:mm:ss.fff"));

                 WebClient wc = new WebClient();


                 byte[] bytesHtml = wc.DownloadData(strFilePath);

                 CharCode charcode = new CharCode();


                Encoding ecode = charcode.GetCharCode(bytesHtml);

                string strHtml = ecode.GetString(bytesHtml);

                 mshtml.HTMLDocument hd = new mshtml.HTMLDocument();
                mshtml.IHTMLDocument2 ihd2 = (mshtml.IHTMLDocument2)hd;
                ihd2.write(new object[] { strHtml });

            }

    Wednesday, March 2, 2011 5:21 PM

Answers

All replies

  • May be you can use Regular expression to retrieve required table content and then use code in following article to remove html tags from it

    http://www.codeproject.com/KB/HTML/HTML_to_Plain_Text.aspx


    Gaurav Khanna
    • Proposed as answer by Cookie Luo Friday, March 4, 2011 5:12 AM
    • Marked as answer by Cookie Luo Friday, March 11, 2011 1:48 AM
    Wednesday, March 2, 2011 6:44 PM
  • To get you started, this is the Regex to get the tables:

    <table(.|\n)*?</table>

     

    Noam B.



    Do not Forget to Vote as Answer/Helpful, please. It encourages us to help you...
    • Proposed as answer by Noam B Thursday, March 3, 2011 11:13 AM
    Thursday, March 3, 2011 11:13 AM