none
XPATH How do I extract value of data-amzn attribute?

    Question

  • Hello,

    I have XML document below, I need to extract value of attribute called "data-amzn" of <DIV> element, and attribute SRC of IMG element inside?

    <article id="post-123769" class="post-123769 post type-post status-publish format-standard hentry category-children-books-kids tag-a-j-cosmo post clearfix ">
    	<div class="post-inner">
    	
    	
    	
    	<div class="amazon" data-amzn="http://www.amazon.com/dp/B00AMSUCTE"> 
    	
    	<p id="breadcrumbs"><span xmlns:v="http://rdf.data-vocabulary.org/#"><span typeof="v:Breadcrumb"><a href="http://hundredzeros.com" rel="v:url" property="v:title">Kindle Books</a></span> &raquo; <span typeof="v:Breadcrumb"><a href="http://hundredzeros.com/children-books-kids" rel="v:url" property="v:title">Children eBooks</a></span> &raquo; <span typeof="v:Breadcrumb"><span class="breadcrumb_last" property="v:title">A Bear of a Christmas</span></span></span></p> 
    	<figure class="post-image right"> 
    	<a class="amzn"><img src="http://ecx.images-amazon.com/images/I/51hY76WXCuL.jpg" /></a> 
    
    	</figure> 
    	
    	<div class="post-content"> 
    article id="post-123769" class="post-123769 post type-post status-publish format-standard hentry category-children-books-kids tag-a-j-cosmo post clearfix ">
    	<div class="post-inner">


    Tuesday, December 17, 2013 8:43 PM

Answers

  • >>XPATH actually works just fine //div[@class='amazon'] actually finds entire <div /> element.

    How do you use the XPATH? If you use the Html Agility Pack, use codes like below:

    doc.DocumentNode.SelectSingleNode("//div[@data-amzn]").Attributes["data-amzn"].Value;

    This will return specific attribute value inside.

    If you are using

    System.Xml.XPath;
    

    Then you can use codes like below:

    string dataamznvalue = doc.XPathSelectElement("//div[@data-amzn]").Attribute("data-amzn").Value;

    It will also return  specific attribute value inside.

    Regards.
     


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, December 18, 2013 6:00 AM

All replies

  • Hi,

    Are you sure your document a XML file rather than a html file?

    Could you please share the whole file?

    Wednesday, December 18, 2013 1:40 AM
  • Yes, this is HTML file but I assume HTML is subset of XML file and hence shall be able to be parsed by XPATH expression.
    Wednesday, December 18, 2013 1:50 AM
  • Hello,

    >>this is HTML file but I assume HTML is subset of XML file and hence shall be able to be parsed by XPATH expression.

    Actually you do not need to assume it to be subset of XML file because html file can be supported by XPATH expression by using Html Agility Pack

    For extracting value of attribute called "data-amzn" of <DIV> element, and attribute SRC of IMG element inside, after we install the package, we can use codes like below:

    string xmlFile = "E:\\BMX\\File\\2013-12\\Sample18.html";
    
    
                HtmlDocument doc = new HtmlDocument();
    
                doc.Load(xmlFile);
    
                string dataamznvalue = doc.DocumentNode.SelectSingleNode("//div[@data-amzn]").Attributes["data-amzn"].Value;
    
                string imgsrcvalue = doc.DocumentNode.SelectSingleNode("//img[@src]").Attributes["src"].Value;
    
                return message;
    

    It will return the attribute value inside.

    For installing the package, we just need to use the Package Manger Command tool and run command like below:

    Install-Package HtmlAgilityPack

    For details, please see here:

    http://www.nuget.org/packages/HtmlAgilityPack

    Regards.


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, December 18, 2013 3:41 AM
  • I can not use DOM, it has to be done in XPATH.

    This is for Yahoo pipes and hence no DOM is allowed. XPATH actually works just fine //div[@class='amazon'] actually finds entire <div /> element. I just need to figure out how to modify XPATH to retrieve specific attribute in original post.

    Wednesday, December 18, 2013 4:09 AM
  • >>XPATH actually works just fine //div[@class='amazon'] actually finds entire <div /> element.

    How do you use the XPATH? If you use the Html Agility Pack, use codes like below:

    doc.DocumentNode.SelectSingleNode("//div[@data-amzn]").Attributes["data-amzn"].Value;

    This will return specific attribute value inside.

    If you are using

    System.Xml.XPath;
    

    Then you can use codes like below:

    string dataamznvalue = doc.XPathSelectElement("//div[@data-amzn]").Attribute("data-amzn").Value;

    It will also return  specific attribute value inside.

    Regards.
     


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, December 18, 2013 6:00 AM