none
Performance problem: Converting rows to a comma separated column using STUFF() and self join RRS feed

  • Question

  • Hi,

    This might be a really dumb one to ask but I am currently working on a table that has sequential data for steps that an invoice goes through in a particular system. Here is how it looks:

    ID InvoiceID InvoiceSteps Timestamp
    283403 0000210121_0002_2013 Post FI Invoice 2013-07-01 19:07:00.0000000
    389871 0000210121_0002_2013 Clear Invoice 2013-08-25 14:02:00.0000000

    Here is my extremely slow query that converts multiple rows of an invoice into a single one with 'InvoiceSteps' listed according to their timestamps in a sequential manner separated by commas.

    SELECT [InvoiceID],
           [InvoiceSteps] = STUFF((
              SELECT ',' + ma.InvoiceSteps
              FROM invoices ma
              WHERE m.InvoiceID = ma.InvoiceID
    		  ORDER BY [Timestamp]
              FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, '')
    
    FROM invoices m
    GROUP BY InvoiceID
    ORDER BY InvoiceID;

    Here is the end result:

    InvoiceID InvoiceSteps
    0000210121_0002_2013 Post FI Invoice,Clear Invoice

    My question: How can I improve the query so that it can process thousands of records as fast as possible (>600K in this case).

    Thanks you!

    Tuesday, December 16, 2014 11:56 PM

Answers

  • Did you try creating a nonclustered index on InvoiceID and include column InvoiceSteps ?

    CREATE NONCLUSTERED INDEX IX_InvId ON Invoices (InvoiceId) INCLUDE (InvoiceSteps)


    Thanks and regards, Rishabh K

    • Marked as answer by syedaliammar Wednesday, December 17, 2014 12:27 PM
    Wednesday, December 17, 2014 4:28 AM

All replies

  • Did you try creating a nonclustered index on InvoiceID and include column InvoiceSteps ?

    CREATE NONCLUSTERED INDEX IX_InvId ON Invoices (InvoiceId) INCLUDE (InvoiceSteps)


    Thanks and regards, Rishabh K

    • Marked as answer by syedaliammar Wednesday, December 17, 2014 12:27 PM
    Wednesday, December 17, 2014 4:28 AM
  • There are many methods to concatenate the rows to columns. Assuming you have necessary indexes to support your query as Rishabh suggested, if you still find issues with performance, then you need to look at various other approaches as well. I have seen at certain places(huge data), CLR outperformed . Having said, we need to assess each and come to a conclusion for your scenario.

    Refer the below link for various approach, (please also look at the comment session as well):

    https://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/

    Wednesday, December 17, 2014 5:35 AM
  • Hi,

    I agreed on taking steps on looking at the index too. 

    You can also try the code below eg.

    declare @invoicesteps nvarchar(max) = ''
    select @invoicesteps = @invoicesteps + coalesce(invoicesteps,',') + ',' from invoices
    
    select @invoicesteps


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, December 17, 2014 7:40 AM
  • Hi @Rishabh,

    Thanks for your response. But as you may see in the sample data that I've given above, I cannot make an index on the 'InvoiceId' column because it is not UNIQUE. Is there another way to tackle this then?


    • Edited by syedaliammar Wednesday, December 17, 2014 12:00 PM
    Wednesday, December 17, 2014 11:59 AM
  • >> I cannot make an index on the 'InvoiceId' column because it is not UNIQUE. Is there another way to tackle this then?


    Why not? Still you can create an (non unique) index.

    Satheesh
    My Blog | How to ask questions in technical forum

    Wednesday, December 17, 2014 12:05 PM
  • Hi Rishabh/Satheesh,

    Spot on! From 10 seconds for 100 rows to 6 seconds for >600K rows. Much appreciated!

    Wednesday, December 17, 2014 12:26 PM