Select distinct Data with certain text

Answered Select distinct Data with certain text

  • Thursday, December 27, 2012 7:44 AM
     
     

    I have a content in which i have certain page name with ended with .aspx

    href="http://www.mysite.com/MyPage.aspx"

    I wanted to list of all such distinct entries in all of my content

    how can i retrieve them.

    At the moment it simply selected all content
    SELECT distinct [MYContents]
    FROM [dbo].[MYTable]
    WHERE [MYContents] LIKE '%.aspx%'

    My DB is sql server 2005


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])



All Replies

  • Thursday, December 27, 2012 7:53 AM
    Moderator
     
      Has Code

    Perhaps:

    SELECT DISTINCT [MYContents] 
    FROM [dbo].[MYTable] 
    WHERE [MYContents] LIKE '%.aspx%' 


    Kalman Toth SQL 2008 GRAND SLAM
    New Book: SQL Programming & Database Design Using Microsoft SQL Server 2012

  • Thursday, December 27, 2012 8:04 AM
     
     
    I need to get full href="http://www.mysite.com/MyPage.aspx" entries with in the content whcih also have other body element

    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

  • Thursday, December 27, 2012 8:32 AM
     
      Has Code

    This TVF may help to extract links

    create function dbo.GetLinks (@t nvarchar(max))
    	returns @Links table (link nvarchar(max))
    as
    begin
    	declare @strtpos int = PATINDEX('%href="http%.aspx"%', @t)
    	declare @endpos int = 0
    	declare @lnk nvarchar(max)
    	while @strtpos > 0
    	begin
    		select  @endpos = PATINDEX('%.aspx"%', @t)+ 5
    		  , @lnk = substring(@t ,@strtpos, @endpos - @strtpos)
    		  , @t= RIGHT (@t, len(@t) - @endpos)
    		  , @strtpos = PATINDEX('%href="http%.aspx"%', @t)
    		insert @Links values(@lnk )
    	end 
    	return
    end
    go
    
    -- test--
    declare @arg nvarchar(max)	
    set @arg= 'I have a content in which i have certain page name with ended with .aspx
    href="http://www.mysite.com/MyPage.aspx"
    I wanted to list of all such distinct entries in all of my content
    href="http://www2.mysite.com/My444Page.aspx" '
    
    select * from dbo.GetLinks(@arg)
    Hope no links like href="http://www2.aspx.my.com/My444Page.aspx" or somthing. Then you'll need full feartured HTML paser.

    Serg



  • Thursday, December 27, 2012 9:32 AM
     
      Has Code

    Thanks a lot

    Corrected with minor tweak for sql server 2005

    create function dbo.GetLinks (@t nvarchar(max))
        returns @Links table (link nvarchar(max))
    as
    begin
        declare @strtpos int
        SET @strtpos = PATINDEX('%href="http%.aspx"%', @t)
        declare @endpos int
        SET @endpos = 0
        declare @lnk nvarchar(max)
        while @strtpos > 0
        begin
            select  @endpos = PATINDEX('%.aspx"%', @t)+ 5
              , @lnk = substring(@t ,@strtpos, @endpos - @strtpos +1)
              , @t= RIGHT (@t, len(@t) - @endpos)
              , @strtpos = PATINDEX('%href="http%.aspx"%', @t)
            insert @Links values(@lnk )
        end
        return
    end
    go

    Now how to retrieve it for all records.From

    declare @arg nvarchar(max)	
    set @arg= 'I have a content in which i have certain page name with ended with .aspx
    href="http://www.mysite.com/MyPage.aspx"
    I wanted to list of all such distinct entries in all of my content
    href="http://www2.mysite.com/My444Page.aspx" '
    
    select * from dbo.GetLinks(@arg)

    I can get it for only one record


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])



  • Thursday, December 27, 2012 10:44 AM
     
     

    I have tried with


    SELECT DISTINCT link
    FROM [dbo].[MYTable]
    CROSS APPLY dbo.GetLinks([MYContents])

    Thanks
    May be some problem in function or data as i am getting error
    Msg 536, Level 16, State 5, Line 1
    Invalid length parameter passed to the SUBSTRING function.

    Remember i may have multiple href


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

  • Thursday, December 27, 2012 11:33 AM
     
      Has Code

    Cann't reproduce the error the script  runs OK:

    select z.* from 
    (select 'I have a content in which i have certain page  name with ended with .aspx
     href="http://www.mysite.com/MyPage.aspx"
     I wanted to list of all such distinct entries in all of my content 
    href="http://www2.mysite.com/My444Page.aspx"' as content) x
    cross apply dbo.GetLinks(x.content) z

    Can you show your Content value where it fails?


    Serg


    • Edited by SergNL Thursday, December 27, 2012 11:35 AM
    •  
  • Thursday, December 27, 2012 2:44 PM
     
     

    select z.* from
    (select 'href="MyFirstPAge.aspx"

    href="http://www.test.com/2009/05/aa-bb.html"

    href="MySecondPage.aspx"' as content) x
    cross apply dbo.GetLinks(x.content) z

    is the sample of error data


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

  • Friday, December 28, 2012 12:59 AM
     
     

    I think your function is not working properly. It is showing error near substring inside your function.

    create function dbo.GetLinks (@t nvarchar(max))
        returns @Links table (link nvarchar(max))
    as
    begin
        declare @strtpos int
        SET @strtpos = PATINDEX('%href="http%.aspx"%', @t)
        declare @endpos int
        SET @endpos = 0
        declare @lnk nvarchar(max)
        while @strtpos > 0
        begin
            select  @endpos = PATINDEX('%.aspx"%', @t)+ 5
              , @lnk = substring(@t ,@strtpos, @endpos - @strtpos +1) -- here is the error 
              , @t= RIGHT (@t, len(@t) - @endpos)
              , @strtpos = PATINDEX('%href="http%.aspx"%', @t)
            insert @Links values(@lnk )
        end
        return
    end
    go


    ESHANI. Please click "Mark As Answer" if a post solves your problem or "Vote As Helpful" if a post has been useful to you

  • Friday, December 28, 2012 5:02 AM
     
     

    I know about the error position but doesn't know how to correct it.

    Any help SergNL


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

  • Friday, December 28, 2012 7:18 AM
     
     Answered

    As it was said early, this simple script is suitable only if all references are of the form

    href="http ... .aspx". 

    I.e. every link must start with 'href="http' and end with '.aspx"'.

    Otherwise  you need HTML parser which is capabe to detect any link (and precisely links only, for example exclude commented HTML) according to full HTML syntax. Google says there's a number of free C# parsers to be found on the net. Such a parser can be refactored to CLR TVF  http://msdn.microsoft.com/en-us/library/ms131077(v=sql.100).aspx


    Serg


  • Friday, December 28, 2012 7:30 AM
     
     

    Does this site message parser detects HTML comments in messages?

    <!--This is a comment. http://www.w3schools.com/tags/tag_comment.asp Comments are not displayed in the browser-->

    No, it's not intented to do such a tricks.


    Serg



    • Edited by SergNL Friday, December 28, 2012 7:33 AM
    •  
  • Friday, December 28, 2012 7:51 AM
     
     
    You can post it in ASP.NET Forum
  • Friday, December 28, 2012 9:23 AM
     
      Has Code

    create table #tb(id int,arg nvarchar(max))

    insert into #tb select 1,'I have a content in which i have certain page name with ended with .aspx href="http://www.mysite.com/MyPage.aspx" I wanted to list of all such distinct entries in all of my content href="http://www2.mysite.com/My444Page.aspx" ' union all select 2,'href="http://www.mysite2.com/MyPage.aspx" I wanted to list of all such distinct entries in all of my content href="http://www2.mysite2.com/My555Page.aspx" ' union all select 3,'test3 a.aspx href="http://www.google.com/MyPage.aspx" I wanted to list of all such distinct entries in all of my content' union all select 4,'I have a test with .aspx href="http://www.mysite4.com/MyPage.aspx" and href="http://www2.mysite4.com/My666Page.aspx" 'union all select 5,'href="http://www.mysite4.com/MyPage.aspx" and href="http://www2.mysite4.com/My666Page.aspx" or href="http://www2.mysite4.com/My777Page.aspx"' ;with cte as ( select ID,stuff(arg,1,charindex('aspx"',arg,patindex('%href="http%.aspx"%',arg))+5,'') as arg ,substring(arg ,patindex('%href="http%.aspx"%',arg)+6 ,charindex('aspx"',arg,patindex('%href="http%.aspx"%',arg))-patindex('%href="http%.aspx"%',arg)-2) as httplink ,patindex('%href="http%.aspx"%',arg) as start ,charindex('aspx"',arg,patindex('%href="http%.aspx"%',arg)) as [end] from #tb union all select ID,stuff(arg,1,charindex('aspx"',arg,patindex('%href="http%.aspx"%',arg))+5,'') as arg ,substring(arg ,patindex('%href="http%.aspx"%',arg)+6 ,charindex('aspx"',arg,patindex('%href="http%.aspx"%',arg))-patindex('%href="http%.aspx"%',arg)-2) as httplink ,patindex('%href="http%.aspx"%',arg) as start ,charindex('aspx"',arg,patindex('%href="http%.aspx"%',arg)) as [end] from cte a where patindex('%href="http%.aspx"%',arg)>0 ) select ID, httplink from cte order by id

    Using CTE to parse HTML.

    -----------------------------

    1 http://www.mysite.com/MyPage.aspx
    1 http://www2.mysite.com/My444Page.aspx
    2 http://www2.mysite2.com/My555Page.aspx
    2 http://www.mysite2.com/MyPage.aspx
    3 http://www.google.com/MyPage.aspx
    4 http://www.mysite4.com/MyPage.aspx
    4 http://www2.mysite4.com/My666Page.aspx
    5 http://www.mysite4.com/MyPage.aspx
    5 http://www2.mysite4.com/My666Page.aspx
    5 http://www2.mysite4.com/My777Page.aspx

  • Friday, December 28, 2012 11:01 AM
     
      Has Code

    Thanks Barton But got same error

    ;With cte
         As (Select MyDataID
                    , Stuff(MyDataContents, 1, Charindex('aspx"', MyDataContents, Patindex('%href="http%.aspx"%', MyDataContents))
                                                + 5, '')                                                                                                                                            As MyDataContents
                    , Substring(MyDataContents, Patindex('%href="http%.aspx"%', MyDataContents)
                                                 + 6, Charindex('aspx"', MyDataContents, Patindex('%href="http%.aspx"%', MyDataContents)) - Patindex('%href="http%.aspx"%', MyDataContents) - 2) As httplink
                    , Patindex('%href="http%.aspx"%', MyDataContents)                                                                                                                              As start
                    , Charindex('aspx"', MyDataContents, Patindex('%href="http%.aspx"%', MyDataContents))                                                                                         As [end]
             From   MyDatas
             Union All
             SELECT  MyDataID
                    , Stuff(MyDataContents, 1, Charindex('aspx"', MyDataContents, Patindex('%href="http%.aspx"%', MyDataContents))
                                                + 5, '')                                                                                                                                            As MyDataContents
                    , Substring(MyDataContents, Patindex('%href="http%.aspx"%', MyDataContents)
                                                 + 6, Charindex('aspx"', MyDataContents, Patindex('%href="http%.aspx"%', MyDataContents)) - Patindex('%href="http%.aspx"%', MyDataContents) - 2) As httplink
                    , Patindex('%href="http%.aspx"%', MyDataContents)                                                                                                                              As start
                    , Charindex('aspx"', MyDataContents, Patindex('%href="http%.aspx"%', MyDataContents))                                                                                         As [end]
             From   cte a
             Where  Patindex('%href="http%.aspx"%', MyDataContents) > 0)
    Select MyDataID
           , httplink
    From   cte
    Order  By MyDataID 
    


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

  • Friday, December 28, 2012 11:31 AM
     
     

    2 Barton

    Yes, it's a good example how to convert  WHILE  into recursive CTE.

    And it will  do the job. But alias, the same job ( plus it will not fail on some data).  It will extract the link 'href="http://mysinte.com"_contains many".aspx"' from a content, type of  'My site href="http://mysinte.com"_contains many".aspx" pages.' 

    Which is  hardly what is needed.


    Serg

  • Friday, December 28, 2012 11:42 AM
     
      Has Code

    My sample data is 

    <a shape="rect" href="http://www.qwewqe.fda.gov/wqewqe/pdf8/wewe.pdf" target="_blank">System</a> rket. </p>been calling for
     <a shape="rect" href="http://www.sdfdsf.org/news/sdfsdf/sdfsdfsdf.html" target="_blank">rocess</a>,in 2010,
     <a shape="rect" href="http://www.gfhgfh.com/defective-medical-devices/fghfgh-takes-aim-at-501k-fgh-device-approvals" 
     target="_blank"> changes. </p><p style="text-align: justify">The  <a shape="rect" href="fghfghfgh.aspx" target="_blank">
     substantially equivalent.</a> </p><p style="text-align: justify">Per th </p><p style="text-align: justify">market. </p>
     <p style="text-align: justify">The <a shape="rect" href="asdd.aspx" target="_blank">sub.</a> </p><p style="text-align: justify">
     Perhaps</p><p style="text-align: justify">t of a <a shape="rect" href="http://www.asdasdasd.com/view.aspx?rid=67920"
     target="_blank">one-in-eight</a> revision  </p><p style="text-align: justify">Critics de 
     <a shape="rect" href="http://www.sdfsdf.com/2009/05/fda-sdfsdf-devices-without-scrutiny-putting-dsf-at-risk.html"
     target="_blank">5 </p><p style="text-align: justify">We </p><p style="text-align: justify">If </p><p style="text-align: justify">
     <a shape="rect" href="SDFSDFSDF.aspx">sdfsdf</a> </p><p style="text-align: justify"><a shape="rect" href="asd324234dsdasd.aspx">asd</a>
     </p><p style="text-align: justify">  <a shape="rect" href="sadsad.aspx">sasdasdasd</a>


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

  • Friday, December 28, 2012 3:26 PM
     
      Has Code

    Try

    create function dbo.GetLinks (@t nvarchar(max))
    	returns @Links table (link nvarchar(max))
    as
    begin
    	declare @TEMPLATE nvarchar(20)
    	set @TEMPLATE = '%href="%"%'
    
    	declare @strtpos int
    	set @strtpos = PATINDEX(@TEMPLATE, @t)
    	declare @endpos int
    	set @endpos = 0
    	declare @lnk nvarchar(max)
    	while @strtpos > 0
    	begin
    		select @endpos = CHARINDEX('"',@t,@strtpos+6)
    		  , @lnk = substring(@t ,@strtpos, @endpos - @strtpos + 1)
    		  , @t = RIGHT (@t, len(@t)- @endpos)
    		  , @strtpos = PATINDEX(@TEMPLATE, @t)
    		insert @Links values(@lnk )
    	end 
    	return
    end
    

    This works fine on the sample. But again, it does't take into accout a number of HTML features, white spaces for example.


    Serg

  • Friday, December 28, 2012 4:14 PM
     
     

    Thanks a lot sergNL 
    At the moment done it via .net [extracted result set then find out the data via regex class with certain pattern ] 

    But Will try out your function tomorrow as well


    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])

  • Friday, December 28, 2012 5:35 PM
     
     

    You may wish integrate .net regex as SQL object as well

    http://www.codeproject.com/Articles/42764/Regular-Expressions-in-MS-SQL-Server-2005-2008


    Serg

  • Saturday, December 29, 2012 9:28 AM
     
     
    Thanks serg. I know about CLR support thing in sql server but at the moment current task is about some data porting utility. CLR assembly do need extra costing of deployment/maintenance of assembly on the db server so i thought it is fine to deal with the thing in .net as i were already calling the db via .net utility for porting.

    Kamran Shahid Principle Engineer Development (MCP,MCAD,MCSD.NET,MCTS,MCPD.net[web])