ผู้สอบถาม
Extract values from the String filed in SQL server

คำถาม
-
I have nvarchar field in a SQL server table that contains html string where value need be separated into two additional fields and data can be of varying lengths.
Number of Sales: <Area="ABC" Number="1012,1013,1014,1015">4</a><ZZ>END Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END
I need to output to broken in two additional columns as follows:
Order Number Values
1012,1013,1014,1015 4
1011,1012,1013,1014,1015,1016,1017,1018,1019,1020 10
Thanks!
- แก้ไขโดย Red987 10 สิงหาคม 2563 19:55
10 สิงหาคม 2563 19:47
ตอบทั้งหมด
-
Hi Red987,
It would be great if you could provide a minimal reproducible example:
(1) DDL and sample data population, i.e. CREATE table(s) plus INSERT, T-SQL statements.
(2) What you need to do, i.e. logic.
(3) Desired output based on the sample data in #1 above.
(4) Your SQL Server version (SELECT @@version;)10 สิงหาคม 2563 20:06 -
Try:
declare @t table (ID int identity(1,1) primary key, htmlString nvarchar(max)) insert into @t (htmlString) values ('Number of Sales: <Area="ABC" Number="1012,1013,1014,1015">4</a><ZZ>END'), ('Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END') ;with cte as ( select *, substring(htmlString, patindex('% Number="%">%', htmlString) + len(' Number="'), len(htmlString)) as OrderNumbers, substring(htmlString, patindex('%">%', htmlString) + 2, len(htmlString)) as NumberStart from @t) select *, substring(OrderNumbers, 1, PATINDEX('%">%', OrderNumbers)-1) as OrderNumbers, substring(NumberStart, 1, PATINDEX('%</a>%', NumberStart)-1) as Number from cte
Looking for new opportunities
For every expert, there is an equal and opposite expert. - Becker's Law
My blog
My TechNet articles10 สิงหาคม 2563 20:13 -
CREATE TABLE #temp
(
ID int,
Sales nvarchar (MAX)
)
INSERT INTO #temp VALUES (1,'Number of Sales: <Area="ABC" Number="1012,1013,1014,1015">4</a><ZZ>END'),
(2,'Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END')VERSION: SQL 2017
I need to output to look like:
ID Order Number Values
1 1012,1013,1014,1015 4
2 1011,1012,1013,1014,1015,1016,1017,1018,1019,1020 10
10 สิงหาคม 2563 20:20 -
Hi Red987,
While waiting for the ##1-4.
Re-using Naomi's DDL and sample data. Thanks Naomi.
The CTE transforms strings into a well-formed XML. It makes very simple to emit the desired output.
-- DDL and sample data population, start DECLARE @tbl TABLE (ID INT IDENTITY(1, 1) PRIMARY KEY, htmlString NVARCHAR(MAX)); INSERT INTO @tbl (htmlString) VALUES ('Number of Sales: <Area="ABC" Number="1012,1013,1014,1015">4</a><ZZ>END'), ('Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END'), ('Number of Sales: <Area="XBA" Number="">0</a><XX>END'); -- DDL and sample data population, end ;WITH rs AS ( SELECT * , TRY_CAST(REPLACE(LEFT(htmlString, PATINDEX('%</a>%', htmlString) + LEN('</a>')-1) , 'Number of Sales: <', '<a ') AS XML) AS xmldata FROM @tbl ) SELECT ID --, xmldata , xmldata.value('(/a/@Number)[1]', 'VARCHAR(1024)') AS [OrderNumbers] , xmldata.value('(/a/text())[1]', 'INT') AS [Values] FROM rs;
Output ID Order Numbers Values 1 1012,1013,1014,1015 4 2 1011,1012,1013,1014,1015,1016,1017,1018,1019,1020 10 3 0 - เสนอเป็นคำตอบโดย Naomi N 10 สิงหาคม 2563 20:57
- แก้ไขโดย Yitzhak Khabinsky 10 สิงหาคม 2563 22:04
10 สิงหาคม 2563 20:46 -
Thanks for responding. I am getting values for Order numbers filed complete NULL's when I plug this logic into my code.10 สิงหาคม 2563 20:58
-
Thanks for responding. I am getting this error "Invalid length parameter passed to the LEFT or SUBSTRING function". I missed one more scenario when I asked for solution i think the error might be because of this not sure though:
Sample data for string: Number of Sales: <Area="XBA" Number="">0</a><XX>END
In this case the OrderNumbers are NULL's and the value is given as 0 for values field. Could please help me in this handling this issue?
Thanks in advance!
10 สิงหาคม 2563 21:04 -
CREATE TABLE #temp ( ID int, Sales nvarchar (MAX) ); INSERT INTO #temp VALUES (1,'Number of Sales: <Area="ABC" Number = "1012,1013,1014,1015">4</a><ZZ>END'), (2,'Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END'); SELECT ID, SUBSTRING(Sales, PATINDEX('%[0-9][0-9][0-9][0-9]%', Sales), PATINDEX('%">%', Sales) - PATINDEX('%[0-9][0-9][0-9][0-9]%', Sales)) AS [Order Number], SUBSTRING(Sales, PATINDEX('%">%', Sales) + 2, PATINDEX('%</a>%', Sales) - PATINDEX('%">%', Sales) - 2) AS [Values] FROM #temp; DROP TABLE #temp;
A Fan of SSIS, SSRS and SSAS
10 สิงหาคม 2563 21:05 -
Thanks for responding. I am getting this error "Invalid length parameter passed to the LEFT or SUBSTRING function" for below set of data:
Sample data for string: Number of Sales: <Area="XBA" Number="">0</a><XX>END
In this case the OrderNumbers are NULL's and the value is given as 0 for values field.
10 สิงหาคม 2563 21:14 -
Thanks for responding. I am getting values for Order numbers filed complete NULL's when I plug this logic into my code.
Hi Red987,
It is not clear whom you are replying to. You need to mention person's name.
In any case, I added a third row to my solution based on your input.
And everything continues to work without any problem.
I provided you a minimal reproducible example.
Please copy it to SSMS as-is, and give it a shot.
- แก้ไขโดย Yitzhak Khabinsky 10 สิงหาคม 2563 21:31
10 สิงหาคม 2563 21:19 -
I'll respond a bit later, doing some exercises now. The common solution is to use NULLIF function, I'll adjust my code in a bit.
Looking for new opportunities
For every expert, there is an equal and opposite expert. - Becker's Law
My blog
My TechNet articles10 สิงหาคม 2563 21:27 -
@Naomi, thanks for responding! I will look forward to hear from you on the updated code.10 สิงหาคม 2563 21:36
-
@Yitzhak, thanks for responding. I pluged your logic into my code but I am getting values for Order numbers column as complete NULL's.10 สิงหาคม 2563 21:39
-
@Yitzhak, thanks for responding. I pluged your logic into my code but I am getting values for Order numbers column as complete NULL's.
Hi Red987,
Again, I provided you a minimal reproducible example.
Did you try it as-is without plugging in?
- แก้ไขโดย Yitzhak Khabinsky 10 สิงหาคม 2563 21:58
10 สิงหาคม 2563 21:44 -
CREATE TABLE #temp ( ID int, Sales nvarchar (MAX) ) INSERT INTO #temp VALUES (1,'Number of Sales: <Area="ABC" Number="1012,1013,1014,1015">4</a><ZZ>END'), (2,'Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END') ;with mycte as ( select * ,stuff(Sales,1, charindex('number="',Sales),'') s1 from #temp ) select ID , replace(replace(stuff(s1,charindex('>',s1) ,len(s1),'') ,'umber=',''),'"','') OrderNumber ,reverse(stuff(reverse(stuff(s1,charindex('</a>',s1) ,len(s1),'')), charindex('>',reverse(stuff(s1,charindex('</a>',s1) ,len(s1),''))) ,len(s1),'')) Val from mycte drop TABLE if exists #temp
10 สิงหาคม 2563 21:46 -
I've adjusted the code, I'm getting null if the pattern doesn't match:
declare @t table (ID int identity(1,1) primary key, htmlString nvarchar(max)) insert into @t (htmlString) values ('Number of Sales: <Area="ABC" Number="1012,1013,1014,1015">4</a><ZZ>END'), ('Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END'), ('Number of Sales: <Area="XBA" Number="">0</a><XX>END'), ('No pattern at all') ;with cte as ( select *, substring(htmlString, NULLIF(patindex('% Number="%">%', htmlString),0) + len(' Number="'), len(htmlString)) as OrderNumbers, substring(htmlString, NULLIF(patindex('%">%', htmlString),0) + 2, len(htmlString)) as NumberStart from @t) select *, substring(OrderNumbers, 1, NULLIF(PATINDEX('%">%', OrderNumbers),0)-1) as OrderNumbers, substring(NumberStart, 1, NULLIF(PATINDEX('%</a>%', NumberStart),0)-1) as Number from cte
Looking for new opportunities
For every expert, there is an equal and opposite expert. - Becker's Law
My blog
My TechNet articles10 สิงหาคม 2563 22:07 -
@Naomi, Sorry the issue is not due to below sample data:
Number of Sales: <Area="XBA" Number="">0</a><XX>END
The issue is because of the this kind of data in the same field.
No Sales were happened: SPANISH<UV>END
Here is the DDL and sample data:
declare @t table (ID int identity(1,1) primary key, htmlString nvarchar(max))
insert into @t (htmlString)
values ('Number of Sales: <Area="ABC" Number="1012,1013,1014,1015">4</a><ZZ>END'),
('Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END'),
('Number of Sales: <Area="XBA" Number="">0</a><XX>END'),
('No Sales were happened: SPANISH<UV>END')
;with cte as (
select *,
substring(htmlString, patindex('% Number="%">%', htmlString) + len(' Number="'), len(htmlString)) as OrderNumbers,
substring(htmlString, patindex('%">%', htmlString) + 2, len(htmlString)) as NumberStart
from @t)
select *,
substring(OrderNumbers, 1, PATINDEX('%">%', OrderNumbers)-1) as OrderNumbers,
substring(NumberStart, 1, PATINDEX('%</a>%', NumberStart)-1) as Number
from cteID Order Numbers Values
1 1012,1013,1014,1015 4
2 1011,1012,1013,1014,1015,1016,1017,1018,1019,1020 10
3 NULL 0
4 N/A N/AThanks for your help on this.
- แก้ไขโดย Red987 10 สิงหาคม 2563 22:16
10 สิงหาคม 2563 22:15 -
Hi Naomi,
I added a 4th row ('No pattern at all');.
My T-SQL stays rock solid. Everything continues to work without any code change.
Please give it a shot.
- แก้ไขโดย Yitzhak Khabinsky 10 สิงหาคม 2563 22:55
10 สิงหาคม 2563 22:18 -
Try this:
declare @t table (ID int identity(1,1) primary key, htmlString nvarchar(max)) insert into @t (htmlString) values ('Number of Sales: <Area="ABC" Number="1012,1013,1014,1015">4</a><ZZ>END'), ('Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END'), ('Number of Sales: <Area="XBA" Number="">0</a><XX>END'), ('No Sales were happened: SPANISH<UV>END') ;with cte as ( select *, substring(htmlString, NULLIF(patindex('% Number="%">%', htmlString),0) + len(' Number="'), len(htmlString)) as OrderNumbers, substring(htmlString, NULLIF(patindex('%">%', htmlString),0) + 2, len(htmlString)) as NumberStart from @t) select Id, htmlString, NULLIF(ISNULL(substring(OrderNumbers, 1, NULLIF(PATINDEX('%">%', OrderNumbers),0)-1), 'N/A'),'') as OrderNumbers, ISNULL(substring(NumberStart, 1, NULLIF(PATINDEX('%</a>%', NumberStart),0)-1), 'N/A') as [Values] from cte
Looking for new opportunities
For every expert, there is an equal and opposite expert. - Becker's Law
My blog
My TechNet articles10 สิงหาคม 2563 22:28 -
Hi Naomi,
I added a 5th row ('No Sales were happened: SPANISH<UV>END');
My T-SQL stays rock solid. Everything continues to work without any code change.
- แก้ไขโดย Yitzhak Khabinsky 10 สิงหาคม 2563 22:54
10 สิงหาคม 2563 22:30 -
Tried both solutions in one batch (and displayed the actual execution plan just to see the winner):
declare @t table (ID int identity(1,1) primary key, htmlString nvarchar(max)) insert into @t (htmlString) values ('Number of Sales: <Area="ABC" Number="1012,1013,1014,1015">4</a><ZZ>END'), ('Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END'), ('Number of Sales: <Area="XBA" Number="">0</a><XX>END'), ('No Sales were happened: SPANISH<UV>END') ;with cte as ( select *, substring(htmlString, NULLIF(patindex('% Number="%">%', htmlString),0) + len(' Number="'), len(htmlString)) as OrderNumbers, substring(htmlString, NULLIF(patindex('%">%', htmlString),0) + 2, len(htmlString)) as NumberStart from @t) select Id, htmlString, NULLIF(ISNULL(substring(OrderNumbers, 1, NULLIF(PATINDEX('%">%', OrderNumbers),0)-1), 'N/A'),'') as OrderNumbers, ISNULL(substring(NumberStart, 1, NULLIF(PATINDEX('%</a>%', NumberStart),0)-1), 'N/A') as [Values] from cte ;WITH rs AS ( SELECT * , TRY_CAST(REPLACE(LEFT(htmlString, PATINDEX('%</a>%', htmlString) + LEN('</a>')-1) , 'Number of Sales: <', '<a ') AS XML) AS xmldata FROM @t ) SELECT ID --, xmldata , xmldata.value('(/a/@Number)[1]', 'VARCHAR(1024)') AS [OrderNumbers] , xmldata.value('(/a/text())[1]', 'INT') AS [Values] FROM rs;
Looking for new opportunities
For every expert, there is an equal and opposite expert. - Becker's Law
My blog
My TechNet articles10 สิงหาคม 2563 22:38 -
Hi Naomi,
>> "...Tried both solutions in one batch (and displayed the actual execution plan just to see the winner):..."
Is it taking into account maintainability?
Some interesting stats:
And the original poster is gone with the wind.- แก้ไขโดย Yitzhak Khabinsky 10 สิงหาคม 2563 22:49
10 สิงหาคม 2563 22:48 -
Hi Red987,
Please also find another method which has the same query cost with Naomi's query from below.
declare @t table (ID int identity(1,1) primary key, htmlString nvarchar(max)) insert into @t (htmlString) values ('Number of Sales: <Area="ABC" Number="1012,1013,1014,1015">4</a><ZZ>END'), ('Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END'), ('Number of Sales: <Area="XBA" Number="">0</a><XX>END'), ('No Sales were happened: SPANISH<UV>END') select case when CHARINDEX('Number="',htmlString)>0 then SUBSTRING(htmlString,CHARINDEX('Number="',htmlString)+len('Number="'),CHARINDEX('">',htmlString)-CHARINDEX('Number="',htmlString)-len('Number="')) else '' end [Order Numbers], case when CHARINDEX('Number="',htmlString)>0 then SUBSTRING(htmlString,CHARINDEX('">',htmlString)+len('">'),CHARINDEX('</a>',htmlString)-CHARINDEX('">',htmlString)-len('">')) else '' end [Value] from @t /* Order Numbers Value 1012,1013,1014,1015 4 1011,1012,1013,1014,1015,1016,1017,1018,1019,1020 10 0 */
Wishes
Melissa
""SQL Server related"" forum will be migrated to a new home on Microsoft Q&A SQL Server!
We invite you to post new questions in the "SQL Server related" forum’s new home on Microsoft Q&A SQL Server !
For more information, please refer to the sticky post.
- แก้ไขโดย Melissa MaMicrosoft 12 สิงหาคม 2563 7:14
11 สิงหาคม 2563 2:26 -
Thanks for responding. I am getting this error "Invalid length parameter passed to the LEFT or SUBSTRING function" for below set of data:
Sample data for string: Number of Sales: <Area="XBA" Number="">0</a><XX>END
In this case the OrderNumbers are NULL's and the value is given as 0 for values field.
CREATE TABLE #temp ( ID int, Sales nvarchar (MAX) ); INSERT INTO #temp VALUES (1,'Number of Sales: <Area="ABC" Number = "1012,1013,1014,1015">4</a><ZZ>END'), (2,'Number of Sales: <Area="CBA" Number="1011,1012,1013,1014,1015,1016,1017,1018,1019,1020">10</a><YY>END'), (3,'Number of Sales: <Area="XBA" Number="">0</a><XX>END'); SELECT ID, --PATINDEX('%[0-9][0-9][0-9][0-9]%', Sales), CASE WHEN PATINDEX('%[0-9][0-9][0-9][0-9]%', Sales) > 0 THEN SUBSTRING(Sales, PATINDEX('%[0-9][0-9][0-9][0-9]%', Sales), PATINDEX('%">%', Sales) - PATINDEX('%[0-9][0-9][0-9][0-9]%', Sales)) ELSE NULL END AS [Order Number], SUBSTRING(Sales, PATINDEX('%">%', Sales) + 2, PATINDEX('%</a>%', Sales) - PATINDEX('%">%', Sales) - 2) AS [Values] FROM #temp; DROP TABLE #temp;
A Fan of SSIS, SSRS and SSAS
11 สิงหาคม 2563 15:37 -
Hi Red987,
Could you please provide any update ?
Please remember to mark the replies as answers if they helped. Your action would be helpful to other users who encounter the same issue and read this thread.
Thank you for understanding!
Best regards
Melissa""SQL Server related"" forum will be migrated to a new home on Microsoft Q&A SQL Server!
We invite you to post new questions in the "SQL Server related" forum’s new home on Microsoft Q&A SQL Server !
For more information, please refer to the sticky post.12 สิงหาคม 2563 7:14 -
@Melissa, I am seeing the performance issues with below the logic:
select case when CHARINDEX('Number="',htmlString)>0 then SUBSTRING(htmlString,CHARINDEX('Number="',htmlString)+len('Number="'),CHARINDEX('">',htmlString)-CHARINDEX('Number="',htmlString)-len('Number="')) else '' end [Order Numbers], case when CHARINDEX('Number="',htmlString)>0 then SUBSTRING(htmlString,CHARINDEX('">',htmlString)+len('">'),CHARINDEX('</a>',htmlString)-CHARINDEX('">',htmlString)-len('">')) else '' end [Value] from @t
I have large table with around 50million data that needs to be updated and inserted into then final table. My approach was to apply transformations and then load into temp table and finally loaded data from temp to permanent table. But this is taking more than 10minutes to complete. I tried with few other options that was recommended by other folks but nothing has been worked out.
Thanks!
- แก้ไขโดย Red987 14 สิงหาคม 2563 15:17
14 สิงหาคม 2563 15:13 -
Try using batches approach for loading into temp table and then back. It's not going to be quick for 50 mln rows, I think 10 min. is probably fine.
Looking for new opportunities
For every expert, there is an equal and opposite expert. - Becker's Law
My blog
My TechNet articles14 สิงหาคม 2563 15:23 -
@Naomi, thanks for responding! Could you please help me out in sending the sample query to load in batches? Even it takes longer time to load from temp to permanent as well?
Thanks in advance!
14 สิงหาคม 2563 15:31 -
Hi,
I believe I posted some samples in the past and you can also find samples in that forum, just do a search for 'INSERT BATCHES SQL SERVER' (for example).
Looking for new opportunities
For every expert, there is an equal and opposite expert. - Becker's Law
My blog
My TechNet articles14 สิงหาคม 2563 15:41 -
I have large table with around 50million data that needs to be updated and inserted into then final table. My approach was to apply transformations and then load into temp table and finally loaded data from temp to permanent table. But this is taking more than 10minutes to complete. I tried with few other options that was recommended by other folks but nothing has been worked out.
Thanks!
A Fan of SSIS, SSRS and SSAS
14 สิงหาคม 2563 16:18 -
@Guoxiong, The current source table doesn't have any primary key.14 สิงหาคม 2563 16:21
-
@Naomi, will do. Thanks!14 สิงหาคม 2563 16:21
-
@Guoxiong, The current source table doesn't have any primary key.
Then you should not try batching. Batching requires that there is an index you can do the batching over. Preferably a clustered index.
There is little reason to expect that the operation you want to do to be fast. SQL is not intended for that type of string massaging. You might get better performance if you push that into a CLR procedure or a Python external script on SQL 2017+. You might even get better performance by get all data client-side and to run the operation and send the data back.
However, if the format is consistent, so that is always "Number of Sales", never "number of sales" or any other variation of the labels, there is a possible go-faster button: cast your column to a binary collation. This speeds up the string searches considerably if you have a Windows collation, or the column is nvarchar.
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
- เสนอเป็นคำตอบโดย Melissa MaMicrosoft 17 สิงหาคม 2563 1:03
14 สิงหาคม 2563 22:06 -
Hi Red987,
Agreed with other experts, it is better for you to create one index on your source table, preferably a clustered index.
Then the performance could be better during your query.
Or you could have a try with other methods mentioned by Erland.
Best regards
Melissa""SQL Server related"" forum will be migrated to a new home on Microsoft Q&A SQL Server!
We invite you to post new questions in the "SQL Server related" forum’s new home on Microsoft Q&A SQL Server !
For more information, please refer to the sticky post.17 สิงหาคม 2563 6:44