none
C# reading web page info RRS feed

Answers

  • Hi,

    Suppose this is the html code below for which you need information to be read :

     

    <div style='padding-left:12px;' id='myWeb123'> 
    <b>MyWebSite Pics</b> 
    <br /><br /> 
    <img src="http://myWebSite.com/pics/HHTR_01.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br /> 
    <img src="http://myWebSite.com/pics/HHTR_02.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br /> 
    <img src="http://myWebSite.com/pics/HHTR_03.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br /> 
    <img src="http://myWebSite.com/pics/HHTR_04.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br /> 
    <img src="http://myWebSite.com/pics/HHTR_05.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br /> 
    <img src="http://myWebSite.com/pics/HHTR_06.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br /> 
    <img src="http://myWebSite.com/pics/HHTR_07.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br /> 
    <img src="http://myWebSite.com/pics/HHTR_08.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br /> 
    <img src="http://myWebSite.com/pics/HHTR_09.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br /> 
    <img src="http://myWebSite.com/pics/HHTR_10.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br /> 
    <a href="http://www.myWebSite.com/" target="_blank" rel="nofollow">Source</a> 
    </div> 
    
    

    Now using HTMLAgilitypack we can use this code below :

     

     

    HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument(); 
     
    document.Load("FileName.html"); 
     
    // Targets a specific node 
    HtmlNode someNode = document.GetElementbyId("myWeb123"); 
     
    //HtmlNodeCollection linkNodes = document.DocumentNode.SelectNodes("//a[@href]"); 
     
    HtmlNodeCollection linkNodes = document.DocumentNode.SelectNodes("//div[@id='myWeb123']"); 
     
    if (linkNodes != null) 
    { 
        int count = 0; 
        foreach(HtmlNode linkNode in linkNodes) 
        { 
     
            string linkTitle = linkNode.GetAttributeValue("src", string.Empty); 
     
            Debug.Print("linkTitle = " + linkTitle); 
     
            if (linkTitle == string.Empty) 
            { 
                HtmlNode imageNode = linkNode.SelectSingleNode("img[@alt]"); 
                if (imageNode != null) 
                { 
                    Debug.Print("imageNode = " + imageNode.Attributes.ToString()); 
                } 
            } 
            count++; 
            Debug.Print("count = " + count); 
        } 
    } 
    
    

    This above c# code will read information of tags with id myweb123.. similarly you can study your links and html pages in your question and get information you require as per your applied logic.

     

    This is the best possible help we can provide you as extracting and building project as per your need is out of the scope of these forums... so rest work lies on you sir..

    For more information refer following links :

    http://stackoverflow.com/questions/6574109/how-to-timeout-a-request-using-html-agility-pack

    http://stackoverflow.com/questions/7470463/c-sharp-scrape-data-from-wiki-page-screen-scraping

     


    Please mark as answer if this information helps you or unpropose as answer if it does not help you.

    Thanks

    Rehan Bharucha - The Tech Robot

    MCTS, MCITP, MCPD, MCT, MCC

    • Edited by REHAN BHARUCHA Sunday, February 5, 2012 2:15 PM
    • Marked as answer by SSAS_user Sunday, February 5, 2012 4:45 PM
    Sunday, February 5, 2012 2:14 PM