none
multiple dates in a string RRS feed

  • Question

  • A column has all string of text data with multiple dates. I need to select both the dates and couple of other columns from that column as different columns. 

    The column data is as shown 

    company xyz was started on 2016-01-01 and reported MN on 2016-04-01 X Y 

    company abc was started on 2015-01-01 and reported NA on 2016-01-02 K N

    company abcd was started on 2015-01-02 and reported DE on 2015-10-02 X N 

    company abd was started on 2015-11-17 and reported MN on 2015-12-02 K Y 

    company mnc was started  CM   Y 

    Here is what i need to select

    col1           col2       col3    col4  col5

    -----------------------------------------

    2016-01-01 MN 2016-04-01  X     Y

    2015-01-01 NA  2016-01-02  K    N

    2015-01-02 DE 2015-10-02  X     N 

    2015-11-17 MN 2015-12-02  K     Y

    Null            CM Null             NUll  Y

    is this possible? experts need your valuable inputs


    • Edited by oleolehoohoo Thursday, December 29, 2016 4:20 PM
    Thursday, December 29, 2016 4:16 PM

All replies

  • Thursday, December 29, 2016 4:25 PM
    Moderator
  • If all the data is in clean format, try using PATINDEX:

    extract date from text string - Transact-SQL

    PATINDEX to retrieve date from varchar/memo data field



    Kalman Toth Database & OLAP/AI Architect SQL Server 2016 Database Design
    New Book : Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016



    Thats the issue the data is not in a proper format in the source. I am trying to write a conversion script. I tried to do a select patindex('%[1-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]%',[Flat file source error output column]) from table and it returns 0.
    • Edited by oleolehoohoo Thursday, December 29, 2016 4:43 PM
    Thursday, December 29, 2016 4:40 PM
  • >Thats the issue the data is not in a proper format in the source

    You need to cleanse the data first. Then you can apply PATINDEX or other string functions.



    Kalman Toth Database & OLAP/AI Architect SQL Server 2016 Database Design
    New Book : Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016


    Thursday, December 29, 2016 4:50 PM
    Moderator
  • >Thats the issue the data is not in a proper format in the source

    You need to cleanse the data first. Then you can apply PATINDEX or other string functions.



    Kalman Toth Database & OLAP/AI Architect SQL Server 2016 Database Design
    New Book : Beginner Database Design & SQL Programming Using Microsoft SQL Server 2016


    OK. any ssis transform that would help cleanse data?
    Thursday, December 29, 2016 4:53 PM
  • the thing is we need data from every row in the input column. So there is really nothing to delete. But if there is no value then we would prefer to use Null in the destination.
    Thursday, December 29, 2016 4:56 PM
  • Deleted
    • Proposed as answer by Albert_ Zhang Friday, December 30, 2016 3:14 AM
    • Unproposed as answer by Albert_ Zhang Friday, December 30, 2016 3:15 AM
    • Proposed as answer by Albert_ Zhang Thursday, January 5, 2017 12:11 PM
    Thursday, December 29, 2016 10:27 PM
  • Hi oleolehoohoo,

    You could have a look at following demo. In the demo, I assume the last column “col5” just has value “n” or “y”. If not, please let me know. Besides, it uses a function “dbo.UFN_SEPARATES_COLUMNS” that comes from this link and you could have a look at it.

    In my opinion, you’d better first handle your data that has normal format. About the abnormal part, you’d better handle them case by case.

    declare @data table (TextData varchar(max))
    insert into @data values 
    ('company xyz was started on 2016-01-01 and reported MN on 2016-04-01 X Y') ,
    ('company abc was started on 2015-01-01 and reported NA on 2016-01-02 K N') ,
    ('company abcd was started on 2015-01-02 and reported DE on 2015-10-02 X N') , 
    ('company abd was started on 2015-11-17 and reported MN on 2015-12-02 K Y') , 
    ('company mnc was started  CM   Y ') 
    
    select TextData, 
    	case 
    		when CHARINDEX('was started on ',TextData) > 0 then RTRIM(LTRIM(SUBSTRING(textdata, Patindex('%[0-9]%',textdata),len(textdata))))
    		when CHARINDEX('was started ',TextData) > 0 then  RTRIM(LTRIM(REVERSE( SUBSTRING(REVERSE(textdata),0, CHARINDEX(REVERSE( 'was started '),REVERSE(textdata))))))
    		else null 
    	end as newtextdata
    from @data 
    
    
    
    ;with withoutcompanyandstartedon as
    (
    	select TextData, 
    	case 
    		when CHARINDEX('was started on ',TextData) > 0 then RTRIM(LTRIM(SUBSTRING(textdata, Patindex('%[0-9]%',textdata),len(textdata))))
    		when CHARINDEX('was started ',TextData) > 0 then  RTRIM(LTRIM(REVERSE( SUBSTRING(REVERSE(textdata),0, CHARINDEX(REVERSE( 'was started '),REVERSE(textdata))))))
    		else null 
    	end as newtextdata
    	from @data 
    )
    ,withoutreportedonfornormaldata as
    (
    	select *, REPLACE( REPLACE( REPLACE(newtextdata,'and reported ','') ,'on ',''),' ',',') as perdata
    	from withoutcompanyandstartedon
    	where Patindex('%[0-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]%',newtextdata)>0
    ) 
    ,normaldata as
    (
    select 
    	TextData,newtextdata
    	,dbo.UFN_SEPARATES_COLUMNS(perdata ,1,',') as col1
    	,dbo.UFN_SEPARATES_COLUMNS(perdata ,2,',') as col2
    	,dbo.UFN_SEPARATES_COLUMNS(perdata ,3,',') as col3
    	,dbo.UFN_SEPARATES_COLUMNS(perdata ,4,',') as col4
    	,dbo.UFN_SEPARATES_COLUMNS(perdata ,5,',') as col5
    from withoutreportedonfornormaldata 
    )
    ,notnormaldata as
    (
    select *
    	,null as col1
    	,case 
    		when patindex('%[a-z][a-z]%',newtextdata) >0 then SUBSTRING(newtextdata,patindex('%[a-z][a-z]%',newtextdata),2) 
    		else null
    	 end as col2
    	,null as col3
    	,CASE 
    		WHEN patindex('% [a-z][^a-z] %',newtextdata)>0 THEN SUBSTRING(newtextdata,patindex('% [a-z][^a-z] %',newtextdata)+1,1)
    		ELSE NULL
    	 END as col4
    	,case 
    		when PATINDEX('% y%',newtextdata) > 0 or PATINDEX('% n%',newtextdata) > 0 then SUBSTRING(newtextdata,LEN(newtextdata),1)
    		else null
    	 end as col5
    from withoutcompanyandstartedon
    where Patindex('%[0-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]%',newtextdata)= 0
    )
    select * from normaldata
    union all
    select * from notnormaldata
    

    Best Regards,

    Albert Zhang


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Proposed as answer by Albert_ Zhang Thursday, January 5, 2017 12:12 PM
    Friday, December 30, 2016 5:39 AM
  • Jose and Albert,

    Thanks for the replies. The valid values for the COL5 are YN andCOL2 are DE,MN,NA.

    Friday, December 30, 2016 3:20 PM
  • Deleted
    Friday, December 30, 2016 8:22 PM
  • The valid values for the COL5 are YN andCOL2 are DE,MN,NA.


    What could you tell us about the col4 column?

        José Diz     Belo Horizonte, MG - Brasil


    Thanks Jose. The valid values for column 4 are K/X. However the word company is not literally in the text. Its quoted around double quotation as shown below. This is an now an updated input file.

    "xyz" was started on 2016-01-01 and reported MN on 2016-04-01 X Y 

    "abc" was started on 2015-01-01 and reported NA on 2016-01-02 K N

    "abcd " was started on 2015-01-02 and reported DE on 2015-10-02 X N 

    "abd" was started on 2015-11-17 and reported MN on 2015-12-02 K Y 

    "mnc" was started  CM   Y

    Saturday, December 31, 2016 2:53 PM
  • Deleted
    Saturday, December 31, 2016 6:04 PM