none
How to query XML column RRS feed

Answers

  • Yes, I looked the file earlier today, and said to myself "this is HTML, not XML". Well, it seems like it could be a legal XML document, since there is a single root tag, but when I tried to put it into a variable of the xml data type, I get the message:

      Msg 9411, Level 16, State 1, Line 1
      XML parsing: line 10, character 62, semicolon expected

    This is beause the URL on this line looks like this:

    <a href="http://www.youtube.com/watch?v=7FtmjLxkq98&feature=player_profilepage">

    The & character is a special character in XML, use to include a special character, for instance &amp; to include itself. But that can easily be handled with a replace, see below.

    Getting the titles and the URL from the documents is not too tricky, but mapping them to the correct section is, because the section title is in an <H2> tag which does not enclose the A HREF tags. I don't really have an idea how to do this. Here is a lame attempt which gets the section name from the DIV tag, which is simple, since the DIV tags do enclose the A HREF tags.

    SELECT @x = col FROM tbl WHERE
    SELECT @x = replace(@x, '&', '&amp;') 
    DECLARE @u xml = @x

    SELECT Section.c.value('@class', 'nvarchar(30)') AS Section,
           Link.c.value('(./text())[1]', 'varchar(250)') AS Title,
           Link.c.value('@href', 'varchar(250)') AS URL
    FROM   @u.nodes('/td/div') AS Section(c)
    CROSS  APPLY Section.c.nodes('div/div/a') AS Link(c)


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by SSAS_user Sunday, February 5, 2012 4:43 PM
    Sunday, February 5, 2012 3:53 PM

All replies

  • In fact, the text file is HTML format, but we should be able to read it as well, correct?
    Sunday, February 5, 2012 12:04 PM
  • Yes, I looked the file earlier today, and said to myself "this is HTML, not XML". Well, it seems like it could be a legal XML document, since there is a single root tag, but when I tried to put it into a variable of the xml data type, I get the message:

      Msg 9411, Level 16, State 1, Line 1
      XML parsing: line 10, character 62, semicolon expected

    This is beause the URL on this line looks like this:

    <a href="http://www.youtube.com/watch?v=7FtmjLxkq98&feature=player_profilepage">

    The & character is a special character in XML, use to include a special character, for instance &amp; to include itself. But that can easily be handled with a replace, see below.

    Getting the titles and the URL from the documents is not too tricky, but mapping them to the correct section is, because the section title is in an <H2> tag which does not enclose the A HREF tags. I don't really have an idea how to do this. Here is a lame attempt which gets the section name from the DIV tag, which is simple, since the DIV tags do enclose the A HREF tags.

    SELECT @x = col FROM tbl WHERE
    SELECT @x = replace(@x, '&', '&amp;') 
    DECLARE @u xml = @x

    SELECT Section.c.value('@class', 'nvarchar(30)') AS Section,
           Link.c.value('(./text())[1]', 'varchar(250)') AS Title,
           Link.c.value('@href', 'varchar(250)') AS URL
    FROM   @u.nodes('/td/div') AS Section(c)
    CROSS  APPLY Section.c.nodes('div/div/a') AS Link(c)


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by SSAS_user Sunday, February 5, 2012 4:43 PM
    Sunday, February 5, 2012 3:53 PM
  • Erland your reply is very helpful!! I just changed a little bit then I almost get what I want:) The HTML already loaded to DB, and it can be converted to XML. You can get updated file from https://skydrive.live.com/?cid=12AA860598915667&id=12AA860598915667%21145 

    DECLARE @u XML
    SELECT @u = convert(XML, [HTML], 2) FROM dbo.table

    SELECT Section.c.value('./h2[1]', 'nvarchar(30)') AS Section,
           Link.c.value('(./text())[1]', 'varchar(250)') AS Title,
           Link.c.value('@href', 'varchar(250)') AS URL
    FROM   @u.nodes('table/tr/td/div') AS Section(c)
    CROSS  APPLY Section.c.nodes('div/div/a') AS Link(c) 

    Just one thing missed from this query: I also need to get another item: View All

    from example: Billing and Subscriptions || View All ||  http://support.xbox.com/en-US/billing-and-subscriptions/browse

    Could you take a look at again. You helped me a lot.



    • Edited by SSAS_user Sunday, February 5, 2012 4:40 PM
    Sunday, February 5, 2012 4:36 PM
  • I finished it based on your example. Have a good day:)

    • Edited by SSAS_user Sunday, February 5, 2012 4:44 PM
    Sunday, February 5, 2012 4:43 PM
  • SELECT Section.c.value('./h2[1]', 'nvarchar(30)') AS Section,

    Ah, that works! The H2 tag is still in that DIV tag, so it will be synchronised with the other tags.

    Just one thing missed from this query: I also need to get another item: View All

    As I understand your second post, you figured it out before I came around to look at it?


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Sunday, February 5, 2012 11:11 PM
  • As I understand your second post, you figured it out before I came around to look at it?

    Yes:)

    Now I want to add a sort column, the order of each item is very important to me. But it seems it's not easy. I don't want to create another table and use an IDENTITY columns.

    I created a seperated thread since I have closed this one.

    http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/58207864-0f01-47d5-912d-c54f2f9a923b 


    • Edited by SSAS_user Monday, February 6, 2012 3:52 PM
    Monday, February 6, 2012 3:48 PM