Matching the records with logic

Answered Matching the records with logic

  • Thursday, August 09, 2012 4:09 PM
     
      Has Code

    I have some records with different tran_code 2501 and 2502. I need to match the 2502 tran_code record with 2501 tran_code record based on a logic. Below screenshot shows the records that need to be matched.

    Records

    The logic is, earliest 2502 record have to be matched with latest 2501 record and 2501 date_posted should be less than 2502 record date_posted. So in the above sample data, ID 4 should be matched with ID 3 and ID 5 should be matched with ID 2. Below is the output what I expect.

    Output

    I have tried to achieve this but I'm not able to get the exact output. Here I'm pasting my code that I have tried.

    DECLARE	@MyTable TABLE
    (
    	id				INT,
    	tran_code		INT,
    	date_posted		DATETIME,
    	amount			MONEY,
    	tran_type		VARCHAR(5)
    )
    
    INSERT INTO	@MyTable
    VALUES		
    			(1, 2501, '2010-2-12', '50', NULL),
    			(2, 2501, '2010-3-12', '50', NULL),
    			(3, 2501, '2010-4-12', '50', NULL),
    			(4, 2502, '2010-5-12', '-50', NULL),
    			(5, 2502, '2010-5-12', '-50', NULL),
    			(6, 2501, '2010-6-12', '50', NULL)
    			
    			
    SELECT	id, tran_type, date_posted, amount, tran_code,
    		CASE	WHEN tran_code IN ('2501')
    				THEN Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id DESC)
    				ELSE Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id) 
    		END		AS RN
    FROM	@MyTable
    WHERE	tran_code	IN('2501', '2502')
    ORDER BY id
    
    ;WITH	CTE
    AS
    (		
    	SELECT	MYTAB1.tran_type
    	FROM	(
    				SELECT	tran_type, date_posted, amount, tran_code,
    				CASE	WHEN tran_code IN ('2501')
    						THEN Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id DESC)
    						ELSE Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id) 
    				END		AS RN
    				FROM	@MyTable
    				WHERE	tran_code	IN('2501', '2502')
    			)	MYTAB1
    	JOIN	(
    				SELECT	tran_type, date_posted, amount, tran_code,
    				CASE	WHEN tran_code IN ('2501')
    						THEN Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id DESC)
    						ELSE Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id) 
    				END		AS RN
    				FROM	@MyTable
    				WHERE	tran_code	IN('2501', '2502')
    			)	MYTAB2
    	ON			MYTAB1.amount			=	-(MYTAB2.amount)
    	AND			MYTAB1.RN				=	MYTAB2.RN
    	AND			(
    					(
    							MYTAB1.tran_code	=	'2501'
    						AND	MYTAB2.tran_code	=	'2502'	
    						AND	MYTAB1.date_posted	<	MYTAB2.date_posted
    					)
    					OR
    					(
    							MYTAB1.tran_code	=	'2502'
    						AND	MYTAB2.tran_code	=	'2501'
    						AND	MYTAB1.date_posted	>	MYTAB2.date_posted
    					)
    				)
    )
    
    --SELECT * FROM CTE
    
    UPDATE	CTE
    SET		tran_type	=	'Z'
    		
    SELECT		*
    FROM		@MyTable
    ORDER BY	date_posted

    Please help me to get the expected output by correcting this code.


    Thanks,
    Ram


    • Edited by Ram Dhilip.T Thursday, August 09, 2012 4:10 PM
    •  

All Replies

  • Thursday, August 09, 2012 10:10 PM
     
     

    Before I tackle this, I would need to know more. In this example, all amounts that the same. But what happens if you have:

    INSERT INTO   @MyTable
    VALUES     
             (1, 2501, '2010-2-12', 150, NULL),
             (2, 2501, '2010-3-12',  40, NULL),
             (3, 2501, '2010-4-12',  25, NULL),
             (4, 2502, '2010-5-12', -50, NULL),
             (5, 2502, '2010-5-12', -50, NULL),
             (6, 2501, '2010-6-12',  50, NULL)

    Or "this cannot happen"?

    Also, I don't understand the Match column in your screenshot? What does 1 and 2 mean? In your code, you set Match to 'Z'?

    Very important question: which version of SQL Server are you using?

    By the way, in your post you have your amounts in quotes. Quotes are for string literals, not for numbers. SQL Server has implicit conversion between strings and numbers so it works anyway, but since strings and numbers have very different characteristics, it's important to understand that they should normally not be interchanged. (And I think it's mistake that SQL Server has implicit conversion between the two.)


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
  • Friday, August 10, 2012 9:33 AM
     
     

    Different amounts should not be matched. 2502 amount should be equal to 2501 amount. But 2501 should be negative. For example, if 2502 record has the amount 50 and 2501 has -50 then both can be matched.

    Also earliest 2502 record have to be matched with latest 2501 record and 2501 date_posted should be less than 2502 record date_posted.

    In my screenshot I mentioned 1 and 2 to mention which transaction is matched with which one. That means, ID 4 has been matched with ID 3 and ID 5 has been matched with ID 2. But the matched records will be updated with Z finally rather than updating with the numbers I mentioned. I mentioned the numbers 1 and 2 for your understanding that which 2502 record is matched with which 2501 record. Below is the exact output I expect.

    ExactOutput

    I'm using SQL Server 2008.


    Thanks,
    Ram


  • Friday, August 10, 2012 10:29 AM
     
     

    I don't have the time to try a query until I'm at home tonight, but only make sure that I have it right: if transaction 3 had had an amount of 100, 5 would be matched with 2 and 4 would be matched with 1, and 3 would be unmatched?


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
  • Friday, August 10, 2012 10:38 AM
     
     

    I don't have the time to try a query until I'm at home tonight, but only make sure that I have it right: if transaction 3 had had an amount of 100, 5 would be matched with 2 and 4 would be matched with 1, and 3 would be unmatched?


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    If 3 has 100 then 4 should be matched with 2, 5 should be matched with 1 and 3 should be unmatched.

    The logic is, first 2502 record have to be matched with last 2501 record.


    Thanks,
    Ram

  • Friday, August 10, 2012 2:46 PM
     
      Has Code

    I can get the expected output by using the below query that has been written by loop. But I expect that has to be done without loop. I'm pasting the code for your reference that may help you.

    DECLARE	@ID			INT,
    		@DatePosted	DATETIME,
    		@Amount		MONEY,
    		@SubID		INT
    	
    DECLARE	@MyTable TABLE
    		(
    			id			INT,
    			tran_code	INT,
    			date_posted	DATETIME,
    			amount		MONEY,
    			tran_type	VARCHAR(5)
    		)
    
    INSERT INTO	@MyTable
    VALUES		
    			(1, 2501, '2010-2-12', '50', NULL),
    			(2, 2501, '2010-3-12', '50', NULL),
    			(3, 2501, '2010-4-12', '50', NULL),
    			(4, 2502, '2010-5-12', '-50', NULL),
    			(5, 2502, '2010-5-12', '-50', NULL),
    			(6, 2501, '2010-6-12', '50', NULL)
    
    SELECT	@ID			=	MIN(id)
    FROM	@MyTable
    WHERE	tran_code	=	'2502'
    
    WHILE @ID IS NOT NULL
    BEGIN
    
    	SELECT	@DatePosted	=	date_posted,
    			@Amount		=	amount
    	FROM	@MyTable
    	WHERE	id			=	@ID
    	
    	SELECT	@SubID		=	MAX(id)
    	FROM	@MyTable
    	WHERE	tran_code	=	'2501'
    	AND		amount		=	-(@Amount)
    	AND		date_posted	<	@DatePosted
    	AND		tran_type	IS NULL
    	
    	IF @SubID IS NOT NULL
    	BEGIN
    	
    		UPDATE	@MyTable
    		SET		tran_type	=	'Z'
    		WHERE	id			IN(@ID, @SubID)
    		
    	END
    
    	SELECT	@ID			=	MIN(id)
    	FROM	@MyTable
    	WHERE	tran_code	=	'2502'
    	AND		id			>	@ID
    
    END
    
    SELECT		*
    FROM		@MyTable
    ORDER BY	date_posted


    Thanks,
    Ram

  • Friday, August 10, 2012 8:57 PM
     
     Answered

    Maybe it can be done entirely without a loop, and maybe even in a single
    statement (that is, no intermediate temp tables). I am very much in doubt,
    though, that it would be an efficient solution.

    However, loops can be written in different ways, and I assume that in your
    table you have different amounts to match (or the amounts are related to
    different accounts or whatever). And in that case, there is no need loop one
    by one - you can match all the first 2502 in one go, all the second 2502 in
    one go etc.

    Below is a solution to show this, and I've added some more test data. In
    this solution I have materialised the row number in a separate table, since
    I assume that @MyTable in real life is a permanent table. But if @MyTable
    already is a temp table, you would of course added the numbers to that
    table.

    DECLARE   @MyTable TABLE
          (
             id         INT,
             tran_code   INT,
             date_posted   DATETIME,
             amount      MONEY,
             tran_type   VARCHAR(5)
          )

    INSERT INTO   @MyTable
    VALUES (1, 2501, '2010-2-12', 50, NULL),
             (2, 2501, '2010-3-12', 50, NULL),
             (3, 2501, '2010-4-12', 50, NULL),
             (4, 2502, '2010-5-12', -50, NULL),
             (5, 2502, '2010-5-12', -50, NULL),
             (6, 2501, '2010-6-12', 50, NULL),
             (11, 2501, '2010-2-12', 150, NULL),
             (12, 2501, '2010-3-12', 150, NULL),
             (13, 2502, '2010-4-12', -150, NULL),
             (14, 2502, '2010-5-12', -150, NULL),
             (15, 2502, '2010-5-12', -150, NULL),
             (16, 2501, '2010-6-12', 150, NULL)

    DECLARE @numbered TABLE (id         int     NOT NULL PRIMARY KEY,
                             date_posted date   NOT NULL,
                             amount      money  NOT NULL,
                             rowno       int    NOT NULL,
                             UNIQUE (rowno, amount)
    )

    DECLARE @matchedpairs TABLE (id1 int NOT NULL PRIMARY KEY,
                                 id2 int NOT NULL UNIQUE)

    INSERT @numbered (id, date_posted, amount,
                      rowno)
       SELECT id, date_posted, -amount,
              row_number () OVER(PARTITION BY amount ORDER BY date_posted, id)
       FROM   @MyTable
       WHERE  tran_code = 2502

    DECLARE @rowno int

    DECLARE cur CURSOR STATIC LOCAL FOR
       SELECT DISTINCT rowno FROM @numbered ORDER BY rowno

    OPEN cur

    WHILE 1 = 1
    BEGIN
       FETCH cur INTO @rowno
       IF @@fetch_status <> 0
          BREAK

       INSERT @matchedpairs (id1, id2)
          SELECT a.id, b.id
          FROM   @numbered a
          CROSS  APPLY (SELECT TOP 1 b.id
                         FROM   @MyTable b
                         WHERE  b.tran_code = 2501
                           AND  b.amount    = a.amount
                           AND  b.date_posted <= a.date_posted
                           AND  NOT EXISTS (SELECT *
                                            FROM   @matchedpairs mp
                                            WHERE  mp.id2 = b.id)
                         ORDER  BY b.date_posted DESC) AS b
       WHERE  a.rowno = @rowno

       -- If we did not match anything, we can quit.
       IF @@rowcount = 0
          BREAK
    END

    UPDATE @MyTable
    SET    tran_type = 'Z'
    FROM   @MyTable t
    WHERE  EXISTS (SELECT *
                   FROM  (SELECT id1 AS id
                          FROM   @matchedpairs
                          UNION  ALL
                          SELECT id2
                          FROM   @matchedpairs) AS u
                   WHERE  t.id = u.id)

    SELECT * FROM @MyTable ORDER BY id


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
  • Wednesday, September 12, 2012 1:52 PM
     
     

    Thank you for the good solution. This is the exact solution what I expected.

    I have few doubts and I have listed them below.

      • Is the performance better when we use temp table instead of Cursor?
      • Another question is coming in my mind when I think of using temp tables. Is table variable better than temp tables in performance perspective?
      • One final question, here you have used cross apply. But its little confusing to understand it's clear functionality. It will be great if you let me know the part of the CROSS JOIN here and why we use it rather than other things like JOINs?

    Thanks,
    Ram

  • Tuesday, September 25, 2012 8:51 PM
     
     
      • I'm uncertain what you mean here. My solution uses both a table variable and a cursor. The reason I use a cursor is that I think that this is more efficient than a set-based solution, at least on SQL 2008. (And not the least easier to develop. :-). The reason I use the table @numbered is that I assume that you have a bunch of these for different amounts, accounts and whatnots. And since they are independent of each other, they can be handled in breadth, by first handling all rowno = 1, then all rowno = 2 and so on. I expect this to be more efficient than matching only one pair of 2501/2502 at a time. When I think of it, rather than running a cursor over the row numbers, the loop could delete all matched rows from @numbered, and then it would be a loop wth the condition "WHILE EXISTS (SELECT * FROM @numbered). For the logic of the loop, the temp table is not needed, but it's usually better to materialise the row number and index it.
      • Table variables vs. temp tables and performance is one of these "it depends" questions. Temp table has distribution statistics and therefore can cause recompilation which can give you better query plans. But if they don't, the recompilation just costs extra.  One should also be aware of that table variables precludes parallelism. In the long run, temp tables usually gives better performance. But table variables gives more checks at compile time, which is not to be ignored.
      • CROSS JOIN would not work here. APPLY is a special JOIN operator. In a JOIN, the two sides cannot refer to each other. But in APPLY the right side can refer to the left side. Note that the subquery refers columns in the "a" table. This is not possible with JOIN. The net effect is that all rows produced by the subquery are added to the result set. There are two APPLY operators, CROSS and OUTER. The difference is when the subquery does not return any rows. With CROSS the row from the left table is lost, with OUTER it is retained. In this case, I use CROSS, because if there is no match, there is nothing to update.

    Please keep in mind that when it comes to performance questions, the answer is often "it depends", and when in doubt, you need to benchmark with your data and tables.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se