locked
Scan free text field RRS feed

  • Question

  • Hi All

    I have an interesting query, in a database I have a free text field which end users paste a supposedly block of free text they filled in from a template. An example of it could be like so:

    10/05/2017 13:52 [GMT +1:00] TOPdesk System: 
    08/05/2017 13:01 [GMT +1:00] xxxxxx: 
    Leaver's Details:
    - First Name: xxxx
    - Surname   : yyyy
    - Email        : xxxx.yyyyy@zzzzzzz.com
    - Telephone : 01121111112
    - Username : TBA
    - Line Manager: Jimbar Johnson
    
    --------------------------------------
    Leave Date: 02 June 2017
    --------------------------------------
    
    Leaver's Location:
    - Location: xxxxxxxxxx, xxxx city
    - Floor: Floor 199
    
    --------------------------------------------------------------
    
    Requirements for the Leaver's Mailbox:
    - Auto Forward Emails to: xxxxxxxxxxxxxxx
    - Out of Office Message: No
    
    Requirement for the Leaver's Phone & Voice Mail:
    - Auto Forward:  No
    - Modify Voicemail Greeting: No
    
    Requirement for access to the Leaver's Systems:
    - Access to H Drive: No
    - Access to Mailbox: No
    - Access to Voice Mail: No
    
    Extra Information:
    - Corporate Card Holder: No
    
    --------------------------------------------------------------
    
    Additional Information:

    I want to extract the first and last name, and the email from the free text field, even if its position moves slightly up and down (people might not always paste it in the exact same character position)

    is there a way to scan for the words -First Name and return everything between it and the next new line - even though its all in one text field if you copy and paste it, it seems to retain the format it was pasted in originally

    Thanks guys :)

    Also I wanted to integrate this via ssis to active directory and do a directory lookup for the username - is there a component I can install which may help with this?


    Thanks

    James

    Wednesday, May 24, 2017 9:50 AM

Answers

  • Here is an even better option that simply bypasses the need for doing any of the aggregations altogether... It should also be between 2X & 3X faster that the code we've been working with... (because aggregations and extra rows are expensive)

    SELECT 
    	FirstName = SUBSTRING(ftr.FreeTextR, sp2.sp_FirstName, ep.ep_FirstName - sp2.sp_FirstName),
    	Surname = SUBSTRING(ftr.FreeTextR, sp2.sp_Surname, ep.ep_Surname - sp2.sp_Surname),
    	Email = SUBSTRING(ftr.FreeTextR, sp2.sp_Email, ep.ep_Email - sp2.sp_Email)
    FROM 
    	dbo.ChangeActivity ca
    	CROSS APPLY ( VALUES (REPLACE(ca.Description], ' ', '')) ) ftr (FreeTextR)
    	CROSS APPLY ( VALUES (
    						CHARINDEX('FirstName', ftr.FreeTextR, 1),
    						CHARINDEX('Surname', ftr.FreeTextR, 1),
    						CHARINDEX('Email', ftr.FreeTextR, 1)
    						)
    					) sp1 (sp_FirstName, sp_Surname, sp_Email)
    	CROSS APPLY ( VALUES (
    						IIF(sp1.sp_FirstName = 0, 0, CHARINDEX(':', ftr.FreeTextR, sp1.sp_FirstName + 1)),
    						IIF(sp1.sp_Surname = 0, 0, CHARINDEX(':', ftr.FreeTextR, sp1.sp_Surname + 1)),
    						IIF(sp1.sp_Email = 0, 0, CHARINDEX(':', ftr.FreeTextR, sp1.sp_Email + 1))
    						)
    					) sp2 (sp_FirstName, sp_Surname, sp_Email)
    	CROSS APPLY ( VALUES (
    						IIF(sp1.sp_FirstName = 0, 0,  CHARINDEX(CHAR(13), ftr.FreeTextR, sp2.sp_FirstName)),
    						IIF(sp1.sp_Surname = 0, 0, CHARINDEX(CHAR(13), ftr.FreeTextR, sp2.sp_Surname)),
    						IIF(sp1.sp_Email = 0, 0, CHARINDEX(CHAR(13), ftr.FreeTextR, sp2.sp_Email))
    						)
    					) ep (ep_FirstName, ep_Surname, ep_Email);

    Should have thought of this earlier, but, here we are...


    Jason Long


    Friday, May 26, 2017 3:19 PM

All replies

  • Please post CREATE TABLE + sample data + desired result

    DECLARE @st VARCHAR(50)='John.Smith@zzzzzzz.com'


    SELECT LEFT(FULLname,CHARINDEX('.',@st)-1) FirstName,
    RIGHT(FULLname,CHARINDEX('.',REVERSE(FULLname))-1) LastName
    FROM
    (
    SELECT LEFT(@st,CHARINDEX('@',@st)-1) FULLname
    ) AS Der


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence


    Wednesday, May 24, 2017 11:21 AM
    Answerer
  • You can try some thing like below:

    create table test(a varchar(max))
    
    insert into test values('10/05/2017 13:52 [GMT +1:00] TOPdesk System: \n
    08/05/2017 13:01 [GMT +1:00] xxxxxx: \n
    Leaver''s Details:\n
    - First Name: xxxx\n
    - Surname   : yyyy\n
    - Email        : xxxx.yyyyy@zzzzzzz.com\n
    - Telephone : 01121111112\n
    - Username : TBA\n
    - Line Manager: Jimbar Johnson\n
    \n
    --------------------------------------\n
    Leave Date: 02 June 2017\n
    --------------------------------------\n
    \n
    Leaver''s Location:\n
    - Location: xxxxxxxxxx, xxxx city\n
    - Floor: Floor 199\n
    \n
    --------------------------------------------------------------\n
    \n
    Requirements for the Leaver''s Mailbox:\n
    - Auto Forward Emails to: xxxxxxxxxxxxxxx\n
    - Out of Office Message: No\n
    \n
    Requirement for the Leaver''s Phone & Voice Mail:\n
    - Auto Forward:  No\n
    - Modify Voicemail Greeting: No\n
    \n
    Requirement for access to the Leaver''s Systems:\n
    - Access to H Drive: No\n
    - Access to Mailbox: No\n
    - Access to Voice Mail: No\n
    \n
    Extra Information:\n
    - Corporate Card Holder: No\n
    \n
    --------------------------------------------------------------\n
    \n
    Additional Information:')
    
    select ltrim(substring(a,charindex('First Name:',a,1)+11,charindex('\n',substring(a,charindex('First Name:',a,1)+11,len(a)),1)-1)) as FirstName,
    	   ltrim(substring(a,charindex('Surname   :',a,1)+11,charindex('\n',substring(a,charindex('Surname   :',a,1)+11,len(a)),1)-1)) as SurName,
    	   ltrim(substring(a,charindex('Email        :',a,1)+14,charindex('\n',substring(a,charindex('Email        :',a,1)+14,len(a)),1)-1)) as EMail
    from test
    • Edited by SQLNeophyte Wednesday, May 24, 2017 1:33 PM
    Wednesday, May 24, 2017 1:33 PM
  • Hi Neo

    its almost there - but it is complaining that my source column is ntext -

    Here is the create code as requested:

    Create table #mytest([description] ntext)
    insert into #mytest ([description])
    values('10/05/2017 13:52 [GMT +1:00] TOPdesk System: 
    08/05/2017 13:01 [GMT +1:00] xxxxxx: 
    Leaver''s Details:
    - First Name: xxxx
    - Surname   : yyyy
    - Email        : xxxx.yyyyy@zzzzzzz.com
    - Telephone : 01121111112
    - Username : TBA
    - Line Manager: Jimbar Johnson
    
    --------------------------------------
    Leave Date: 02 June 2017
    --------------------------------------
    
    Leaver''s Location:
    - Location: xxxxxxxxxx, xxxx city
    - Floor: Floor 199
    
    --------------------------------------------------------------
    
    Requirements for the Leaver''s Mailbox:
    - Auto Forward Emails to: xxxxxxxxxxxxxxx
    - Out of Office Message: No
    
    Requirement for the Leaver''s Phone & Voice Mail:
    - Auto Forward:  No
    - Modify Voicemail Greeting: No
    
    Requirement for access to the Leaver''s Systems:
    - Access to H Drive: No
    - Access to Mailbox: No
    - Access to Voice Mail: No
    
    Extra Information:
    - Corporate Card Holder: No
    
    --------------------------------------------------------------
    
    Additional Information:')
    
    Select [description] from #mytest

    One thing I noticed was my source doesn't have the \n for a new row so im not sure how it knows when a new row was inserted - but if I copy and paste it from the column to word it seems to have retained this info somehow in the background?

    Wednesday, May 24, 2017 2:23 PM
  • Something along these lines...

    DECLARE @FreeText VARCHAR(8000) = '
    10/05/2017 13:52 [GMT +1:00] TOPdesk System: 
    08/05/2017 13:01 [GMT +1:00] xxxxxx: 
    Leaver''s Details:
    - First Name: xxxx
    - Surname   : yyyy
    - Email        : xxxx.yyyyy@zzzzzzz.com
    - Telephone : 01121111112
    - Username : TBA
    - Line Manager: Jimbar Johnson
    
    --------------------------------------
    Leave Date: 02 June 2017
    --------------------------------------
    
    Leaver''s Location:
    - Location: xxxxxxxxxx, xxxx city
    - Floor: Floor 199
    
    --------------------------------------------------------------
    
    Requirements for the Leaver''s Mailbox:
    - Auto Forward Emails to: xxxxxxxxxxxxxxx
    - Out of Office Message: No
    
    Requirement for the Leaver''s Phone & Voice Mail:
    - Auto Forward:  No
    - Modify Voicemail Greeting: No
    
    Requirement for access to the Leaver''s Systems:
    - Access to H Drive: No
    - Access to Mailbox: No
    - Access to Voice Mail: No
    
    Extra Information:
    - Corporate Card Holder: No
    
    --------------------------------------------------------------
    
    Additional Information:';
    
    --======================================================
    -- Solution --
    
    SELECT 
    	rn.n,
    	rn.SearchVal,
    	SearchedValue = LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition))
    FROM 
    	( VALUES (1, 'FirstName:'), (2, 'Surname:'), (3, 'Email:') ) rn (n, SearchVal)
    	CROSS APPLY ( VALUES (REPLACE(@FreeText, ' ', '')) ) rft (FreeTextR)
    	CROSS APPLY ( VALUES (CHARINDEX(rn.SearchVal, rft.FreeTextR, 1)) ) sp1 (StartPosition1)
    	CROSS APPLY ( VALUES (CHARINDEX(':', rft.FreeTextR, sp1.StartPosition1 + 1)) ) sp2 (StartPosition)
    	CROSS APPLY ( VALUES (CHARINDEX(CHAR(13), rft.FreeTextR, sp2.StartPosition)) ) ep (EndPosition)
    

    HTH,


    Jason Long

    Wednesday, May 24, 2017 3:23 PM
  • Hi Jason

    That works really well - I tried plugging it into my table and it came out with an error

    How can i adjust the declare part to come from a table with a list of values instead

    So instead of declare @freetext - how can i make it scan the entire table and bring back all the values

    Ideally i'd like it so you get on one row Firstname, lastname, email rather than the list of values as 3 rows

    Wednesday, May 24, 2017 10:37 PM
  • I'd drop into an inline tale valued function...

    CREATE FUNCTION dbo.tfn_ParseNameAndEmail
    /* ===================================================================================
    05/24/2017 Created to parse the First Name, Last Nmae & Email from a formatted string.
    ==================================================================================== */
    (
    	@FreeTest VARCHAR(8000)
    )
    RETURNS TABLE WITH SCHEMABINDING AS 
    RETURN
    	SELECT 
    		rn.n,
    		rn.SearchVal,
    		SearchedValue = LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition))
    	FROM 
    		( VALUES (1, 'FirstName:'), (2, 'Surname:'), (3, 'Email:') ) rn (n, SearchVal)
    		CROSS APPLY ( VALUES (REPLACE(@FreeText, ' ', '')) ) rft (FreeTextR)
    		CROSS APPLY ( VALUES (CHARINDEX(rn.SearchVal, rft.FreeTextR, 1)) ) sp1 (StartPosition1)
    		CROSS APPLY ( VALUES (CHARINDEX(':', rft.FreeTextR, sp1.StartPosition1 + 1)) ) sp2 (StartPosition)
    		CROSS APPLY ( VALUES (CHARINDEX(CHAR(13), rft.FreeTextR, sp2.StartPosition)) ) ep (EndPosition);
    GO 

    The using it would be as simple as this...

    SELECT 
    	*
    FROM 
    	dbo.MyTable mt
    	CROSS APPLY dbo.tfn_ParseNameAndEmail(mt.TextColumn) pne;

    HTH,


    Jason Long

    Wednesday, May 24, 2017 11:18 PM
  • Of course, if you don't want to (or can't) create a function, here is what it looks like w/o the use of a function...

    SELECT 
    	rn.n,
    	rn.SearchVal,
    	SearchedValue = LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition))
    FROM 
    	dbo.MyTable mt
    	CROSS APPLY ( VALUES (1, 'FirstName:'), (2, 'Surname:'), (3, 'Email:') ) rn (n, SearchVal)
    	CROSS APPLY ( VALUES (REPLACE(mt.TextColumn, ' ', '')) ) rft (FreeTextR)
    	CROSS APPLY ( VALUES (CHARINDEX(rn.SearchVal, rft.FreeTextR, 1)) ) sp1 (StartPosition1)
    	CROSS APPLY ( VALUES (CHARINDEX(':', rft.FreeTextR, sp1.StartPosition1 + 1)) ) sp2 (StartPosition)
    	CROSS APPLY ( VALUES (CHARINDEX(CHAR(13), rft.FreeTextR, sp2.StartPosition)) ) ep (EndPosition);


    Jason Long

    Wednesday, May 24, 2017 11:24 PM
  • Hi James OHara,

    Avoid using ntext data type in new development work, use nvarchar(max) instead. For an existing column defined as ntext, use CAST and CONVERT to explicitly convert the data type. To transpose rows and columns in a table, take a look at FROM - Using PIVOT and UNPIVOT.


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Thursday, May 25, 2017 5:06 AM
  • Sorry... I missed the part about getting all three values on a single line.

    That's easy enough, just modify the select list to look like the following.

    SELECT 
    	FirstName = MAX(CASE WHEN rn.n = 1 THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END),
    	LastName = MAX(CASE WHEN rn.n = 2 THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END),
    	Email = MAX(CASE WHEN rn.n = 3 THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END)
    FROM 
    	( VALUES (1, 'FirstName:'), (2, 'Surname:'), (3, 'Email:') ) rn (n, SearchVal)
    	CROSS APPLY ( VALUES (REPLACE(@FreeText, ' ', '')) ) rft (FreeTextR)
    	CROSS APPLY ( VALUES (CHARINDEX(rn.SearchVal, rft.FreeTextR, 1)) ) sp1 (StartPosition1)
    	CROSS APPLY ( VALUES (CHARINDEX(':', rft.FreeTextR, sp1.StartPosition1 + 1)) ) sp2 (StartPosition)
    	CROSS APPLY ( VALUES (CHARINDEX(CHAR(13), rft.FreeTextR, sp2.StartPosition)) ) ep (EndPosition)


    Jason Long

    Thursday, May 25, 2017 1:38 PM
  • Hi Jason

    Thanks for your awesome help!

    I tried this code:

    SELECT 
    	FirstName = MAX(CASE WHEN rn.n = 1 THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END),
    	LastName = MAX(CASE WHEN rn.n = 2 THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END),
    	Email = MAX(CASE WHEN rn.n = 3 THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END)
    FROM 
    	dbo.changeactivity mt
    	CROSS APPLY ( VALUES (1, 'FirstName:'), (2, 'Surname:'), (3, 'Email:') ) rn (n, SearchVal)
    	CROSS APPLY ( VALUES (REPLACE(mt.[Description], ' ', '')) ) rft (FreeTextR)
    	CROSS APPLY ( VALUES (CHARINDEX(rn.SearchVal, rft.FreeTextR, 1)) ) sp1 (StartPosition1)
    	CROSS APPLY ( VALUES (CHARINDEX(':', rft.FreeTextR, sp1.StartPosition1 + 1)) ) sp2 (StartPosition)
    	CROSS APPLY ( VALUES (CHARINDEX(CHAR(13), rft.FreeTextR, sp2.StartPosition)) ) ep (EndPosition);

    And got this error:

    Argument data type ntext is invalid for argument 1 of replace function

    So I thought I'd get smart and try to cast it as an nvarchar as follows:

    
    SELECT 
    	FirstName = MAX(CASE WHEN rn.n = 1 THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END),
    	LastName = MAX(CASE WHEN rn.n = 2 THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END),
    	Email = MAX(CASE WHEN rn.n = 3 THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END)
    FROM 
    	dbo.changeactivity mt
    	CROSS APPLY ( VALUES (1, 'FirstName:'), (2, 'Surname:'), (3, 'Email:') ) rn (n, SearchVal)
    	CROSS APPLY ( VALUES (REPLACE(convert(nvarchar(1024),mt.[Description]), ' ', '')) ) rft (FreeTextR)
    	CROSS APPLY ( VALUES (CHARINDEX(rn.SearchVal, rft.FreeTextR, 1)) ) sp1 (StartPosition1)
    	CROSS APPLY ( VALUES (CHARINDEX(':', rft.FreeTextR, sp1.StartPosition1 + 1)) ) sp2 (StartPosition)
    	CROSS APPLY ( VALUES (CHARINDEX(CHAR(13), rft.FreeTextR, sp2.StartPosition)) ) ep (EndPosition);

    And got this error:

    Invalid length parameter passed to the left or substring function

    Warning null value is eliminated by an aggregate or other set operation

    I am having real trouble with this query, usually I'm pretty good with SQL and this one really has me stumped"! your help is really appreciated

    J

    Thursday, May 25, 2017 4:10 PM
  • First of all, cast to nvarchar(MAX) and not nvarchar(1024).

    Next, if you are working with this data in a table, you can bet your rear parts that there is data that does not conform to the expected pattern, so you need to augment the condition in the CASE expression to check that the values are are good:

    FirstName = MAX(CASE WHEN rn.n = 1 AND sp2.StartPosition > 0 AND
                              ep.EndPosition > sp2.StartPosition
                         THEN LTRIM(SUBSTRING(rft.FreeTextR,                          sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition))
                    END)

    And similar logic for the other fields.

    Once you get the result sets, you can check how the data where you get one or more NULL get, and possibly refine it.

    Most likely you will find that the complexity grows, so before you dig yourself too deep in that hole, consider the option of writing a C# procedure that uses the RegEx class for the task.

    Thursday, May 25, 2017 6:58 PM
  • No problem... It's been an interesting one to work on.

    I can only guess at the current issue, but I suspect that you've got some bad data in your table (lacking one or more of the 3 search phrases.

    Try changing the last two CROSS APPLY's to the following...

    CROSS APPLY ( VALUES (IIF(sp1.StartPosition1 = 0, 0, CHARINDEX(':', rft.FreeTextR, sp1.StartPosition1 + 1))) ) sp2 (StartPosition)
    CROSS APPLY ( VALUES (IIF(sp1.StartPosition1 = 0, 0, CHARINDEX(CHAR(13), rft.FreeTextR, sp2.StartPosition))) ) ep (EndPosition)

    That should prevent any invalid values from being passed to the  SUBSTRING function.


    Jason Long


    Thursday, May 25, 2017 7:18 PM
  • To the 1st pointmade by Erland, about converting from NTEXT to NVARCHAR(1024)... If you can, you'd be better off simply altering the datatype on the table. 

    If the data warrants it, use NVARCHAR(MAX) ... <There is data containing Unicode characters AND the string length exceeds 4000 characters>

    If there are no Unicode characters, better to use VARCHAR rather than NVARCHAR.

    If the data doesn't exceed 4000 (NVARCHAR) or 8000 (VARCHAR), use a set number rather than MAX.

    Just my 2 cents...


    Jason Long

    Thursday, May 25, 2017 8:28 PM
  • Hi Jason

    This has been a fun learning experience for me - I've got one last query if that is ok?

    this code now works perfectly -

    SELECT 
    	FirstName = MAX(CASE WHEN rn.n = 1 AND sp2.StartPosition > 0 AND ep.EndPosition > sp2.StartPosition THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END),
    	LastName = MAX(CASE WHEN rn.n = 2  AND sp2.startposition > 0 AND ep.EndPosition > sp2.StartPosition THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END),
    	Email = MAX(CASE WHEN rn.n = 3	   AND sp2.startposition > 0 AND ep.EndPosition > sp2.StartPosition THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END)
    FROM 
    	dbo.changeactivity mt
    	
    	CROSS APPLY ( VALUES (1, 'FirstName:'), (2, 'Surname:'), (3, 'Email:') ) rn (n, SearchVal)
    	CROSS APPLY ( VALUES (REPLACE(convert(nvarchar(max),mt.[Description]), ' ', '')) ) rft (FreeTextR)
    	CROSS APPLY ( VALUES (CHARINDEX(rn.SearchVal, rft.FreeTextR, 1)) ) sp1 (StartPosition1)
    	CROSS APPLY ( VALUES (CHARINDEX(':', rft.FreeTextR, sp1.StartPosition1 + 1)) ) sp2 (StartPosition)
    	CROSS APPLY ( VALUES (CHARINDEX(CHAR(13), rft.FreeTextR, sp2.StartPosition)) ) ep (EndPosition)
    

    Because the first part states to only return the "max" - it only returns the newest one record - I need it to return all of them

    I took out the MAX part of it and it returns records across 3 separate rows which is a little frustrating, so each record now has 3 rows and 2 nulls as shown:

    SELECT 
    	FirstName = CASE WHEN rn.n = 1 AND sp2.StartPosition > 0 AND ep.EndPosition > sp2.StartPosition THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END,
    	LastName = CASE WHEN rn.n = 2  AND sp2.startposition > 0 AND ep.EndPosition > sp2.StartPosition THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END,
    	Email = CASE WHEN rn.n = 3	   AND sp2.startposition > 0 AND ep.EndPosition > sp2.StartPosition THEN LTRIM(SUBSTRING(rft.FreeTextR, sp2.StartPosition + 1, ep.EndPosition - sp2.StartPosition)) END
    FROM 
    	dbo.changeactivity mt
    	
    	CROSS APPLY ( VALUES (1, 'FirstName:'), (2, 'Surname:'), (3, 'Email:') ) rn (n, SearchVal)
    	CROSS APPLY ( VALUES (REPLACE(convert(nvarchar(max),mt.[Description]), ' ', '')) ) rft (FreeTextR)
    	CROSS APPLY ( VALUES (CHARINDEX(rn.SearchVal, rft.FreeTextR, 1)) ) sp1 (StartPosition1)
    	CROSS APPLY ( VALUES (CHARINDEX(':', rft.FreeTextR, sp1.StartPosition1 + 1)) ) sp2 (StartPosition)
    	CROSS APPLY ( VALUES (CHARINDEX(CHAR(13), rft.FreeTextR, sp2.StartPosition)) ) ep (EndPosition)

    I get all the records I want - but they are on a diagonal split. Also one more quick Q - I want to simply add the record ID to the selection. which point can I add another field in? it is simply another column on the row called 'CallRef'

    The cross apply stuff is really confusing for me it errors out wherever I seem to want to add it


    Thanks :) last Q I promise! its 99% there

    Friday, May 26, 2017 1:59 PM
  • Easy fix... Add a GROUP BY clause to the query...

    GROUP BY 
    	mt.[Whatever columns you need from the ChangeActivity table]

    This, of course, still assumes that there is only 1 first name, 1 last name & 1 email, per row in ChangeActivity.Description.


    Jason Long


    Friday, May 26, 2017 2:28 PM
  • As for the CROSS APPLY stuff... It's not overly complicated once you get your head wrapped around it.

    In this case (with the exception of the 1st one), I'm simply using them to simplify the expressions. It simply takes small, easy to understand, expression and names them so that they are easier to use in other parts of the query, w/o having to retype that chunk of code every time it's referenced.

    If you were so inclined, you could copy the last one (ep) into the select list and "nest" the others into it, where you see them referenced. Doing that, of course would make for an expression that's nearly impossible to decipher and even harder to debug.

    If you aren't overly familiar with the uses of the APPLY operator, it's well worth learning. It can make for an extremely powerful tool.


    Jason Long

    Friday, May 26, 2017 2:54 PM
  • Here is an even better option that simply bypasses the need for doing any of the aggregations altogether... It should also be between 2X & 3X faster that the code we've been working with... (because aggregations and extra rows are expensive)

    SELECT 
    	FirstName = SUBSTRING(ftr.FreeTextR, sp2.sp_FirstName, ep.ep_FirstName - sp2.sp_FirstName),
    	Surname = SUBSTRING(ftr.FreeTextR, sp2.sp_Surname, ep.ep_Surname - sp2.sp_Surname),
    	Email = SUBSTRING(ftr.FreeTextR, sp2.sp_Email, ep.ep_Email - sp2.sp_Email)
    FROM 
    	dbo.ChangeActivity ca
    	CROSS APPLY ( VALUES (REPLACE(ca.Description], ' ', '')) ) ftr (FreeTextR)
    	CROSS APPLY ( VALUES (
    						CHARINDEX('FirstName', ftr.FreeTextR, 1),
    						CHARINDEX('Surname', ftr.FreeTextR, 1),
    						CHARINDEX('Email', ftr.FreeTextR, 1)
    						)
    					) sp1 (sp_FirstName, sp_Surname, sp_Email)
    	CROSS APPLY ( VALUES (
    						IIF(sp1.sp_FirstName = 0, 0, CHARINDEX(':', ftr.FreeTextR, sp1.sp_FirstName + 1)),
    						IIF(sp1.sp_Surname = 0, 0, CHARINDEX(':', ftr.FreeTextR, sp1.sp_Surname + 1)),
    						IIF(sp1.sp_Email = 0, 0, CHARINDEX(':', ftr.FreeTextR, sp1.sp_Email + 1))
    						)
    					) sp2 (sp_FirstName, sp_Surname, sp_Email)
    	CROSS APPLY ( VALUES (
    						IIF(sp1.sp_FirstName = 0, 0,  CHARINDEX(CHAR(13), ftr.FreeTextR, sp2.sp_FirstName)),
    						IIF(sp1.sp_Surname = 0, 0, CHARINDEX(CHAR(13), ftr.FreeTextR, sp2.sp_Surname)),
    						IIF(sp1.sp_Email = 0, 0, CHARINDEX(CHAR(13), ftr.FreeTextR, sp2.sp_Email))
    						)
    					) ep (ep_FirstName, ep_Surname, ep_Email);

    Should have thought of this earlier, but, here we are...


    Jason Long


    Friday, May 26, 2017 3:19 PM
  • Hi Jason

    The last piece of code appears to mostly work - but we got the old error back with the invalid params being parsed to the function?

    I got 10 rows PERFECT results - then it bombs out, was expecting 34

    Error: 

    Msg 537, Level 16, State 3, Line 5
    Invalid length parameter passed to the LEFT or SUBSTRING function.

    But the 10 results i did get are perfect! i notice we dont check the start position or end position anymore, how can we add that error check into this one so the invalid values dont get passed in?

    This experience has blown my mind! thankyou so much


    J

    Monday, May 29, 2017 2:37 AM
  • Sorry... I didn't see your last response until now. Are you still having problems with the latest solution?

    I can explain the functional differences between the the latest code and the one we started with but they are similar in that they both follow the pattern of:

    1) remove all of the spaces from the text.

    2) find the locations of the 3 "key words".

    3) find the location of the semicolon that follows the key words.

    4) find the location of the carriage returns that follow the semicolons 

    5) select the text between the semicolons and the carriage returns.

    Because the supplied code does rely on some very specific text clues,  several ducks need to be in their rows, so to speak. Odds are you've got some data with some bad ducks.

    If you'd like, we can look at some of the failed values and see if we can adjust the rules to accommodate them.


    Jason Long

    Wednesday, May 31, 2017 3:22 PM