Matching the records with logic
-
Thursday, August 09, 2012 4:09 PM
I have some records with different tran_code 2501 and 2502. I need to match the 2502 tran_code record with 2501 tran_code record based on a logic. Below screenshot shows the records that need to be matched.
The logic is, earliest 2502 record have to be matched with latest 2501 record and 2501 date_posted should be less than 2502 record date_posted. So in the above sample data, ID 4 should be matched with ID 3 and ID 5 should be matched with ID 2. Below is the output what I expect.
I have tried to achieve this but I'm not able to get the exact output. Here I'm pasting my code that I have tried.
DECLARE @MyTable TABLE ( id INT, tran_code INT, date_posted DATETIME, amount MONEY, tran_type VARCHAR(5) ) INSERT INTO @MyTable VALUES (1, 2501, '2010-2-12', '50', NULL), (2, 2501, '2010-3-12', '50', NULL), (3, 2501, '2010-4-12', '50', NULL), (4, 2502, '2010-5-12', '-50', NULL), (5, 2502, '2010-5-12', '-50', NULL), (6, 2501, '2010-6-12', '50', NULL) SELECT id, tran_type, date_posted, amount, tran_code, CASE WHEN tran_code IN ('2501') THEN Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id DESC) ELSE Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id) END AS RN FROM @MyTable WHERE tran_code IN('2501', '2502') ORDER BY id ;WITH CTE AS ( SELECT MYTAB1.tran_type FROM ( SELECT tran_type, date_posted, amount, tran_code, CASE WHEN tran_code IN ('2501') THEN Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id DESC) ELSE Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id) END AS RN FROM @MyTable WHERE tran_code IN('2501', '2502') ) MYTAB1 JOIN ( SELECT tran_type, date_posted, amount, tran_code, CASE WHEN tran_code IN ('2501') THEN Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id DESC) ELSE Row_Number() OVER(PARTITION BY tran_code, amount ORDER BY id) END AS RN FROM @MyTable WHERE tran_code IN('2501', '2502') ) MYTAB2 ON MYTAB1.amount = -(MYTAB2.amount) AND MYTAB1.RN = MYTAB2.RN AND ( ( MYTAB1.tran_code = '2501' AND MYTAB2.tran_code = '2502' AND MYTAB1.date_posted < MYTAB2.date_posted ) OR ( MYTAB1.tran_code = '2502' AND MYTAB2.tran_code = '2501' AND MYTAB1.date_posted > MYTAB2.date_posted ) ) ) --SELECT * FROM CTE UPDATE CTE SET tran_type = 'Z' SELECT * FROM @MyTable ORDER BY date_posted
Please help me to get the expected output by correcting this code.
Thanks,
Ram- Edited by Ram Dhilip.T Thursday, August 09, 2012 4:10 PM
All Replies
-
Thursday, August 09, 2012 10:10 PM
Before I tackle this, I would need to know more. In this example, all amounts that the same. But what happens if you have:
INSERT INTO @MyTable
VALUES
(1, 2501, '2010-2-12', 150, NULL),
(2, 2501, '2010-3-12', 40, NULL),
(3, 2501, '2010-4-12', 25, NULL),
(4, 2502, '2010-5-12', -50, NULL),
(5, 2502, '2010-5-12', -50, NULL),
(6, 2501, '2010-6-12', 50, NULL)Or "this cannot happen"?
Also, I don't understand the Match column in your screenshot? What does 1 and 2 mean? In your code, you set Match to 'Z'?
Very important question: which version of SQL Server are you using?
By the way, in your post you have your amounts in quotes. Quotes are for string literals, not for numbers. SQL Server has implicit conversion between strings and numbers so it works anyway, but since strings and numbers have very different characteristics, it's important to understand that they should normally not be interchanged. (And I think it's mistake that SQL Server has implicit conversion between the two.)
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se -
Friday, August 10, 2012 9:33 AM
Different amounts should not be matched. 2502 amount should be equal to 2501 amount. But 2501 should be negative. For example, if 2502 record has the amount 50 and 2501 has -50 then both can be matched.
Also earliest 2502 record have to be matched with latest 2501 record and 2501 date_posted should be less than 2502 record date_posted.
In my screenshot I mentioned 1 and 2 to mention which transaction is matched with which one. That means, ID 4 has been matched with ID 3 and ID 5 has been matched with ID 2. But the matched records will be updated with Z finally rather than updating with the numbers I mentioned. I mentioned the numbers 1 and 2 for your understanding that which 2502 record is matched with which 2501 record. Below is the exact output I expect.
I'm using SQL Server 2008.
Thanks,
Ram- Edited by Ram Dhilip.T Friday, August 10, 2012 9:40 AM
-
Friday, August 10, 2012 10:29 AM
-
Friday, August 10, 2012 10:38 AM
I don't have the time to try a query until I'm at home tonight, but only make sure that I have it right: if transaction 3 had had an amount of 100, 5 would be matched with 2 and 4 would be matched with 1, and 3 would be unmatched?
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.seIf 3 has 100 then 4 should be matched with 2, 5 should be matched with 1 and 3 should be unmatched.
The logic is, first 2502 record have to be matched with last 2501 record.
Thanks,
Ram -
Friday, August 10, 2012 2:46 PM
I can get the expected output by using the below query that has been written by loop. But I expect that has to be done without loop. I'm pasting the code for your reference that may help you.
DECLARE @ID INT, @DatePosted DATETIME, @Amount MONEY, @SubID INT DECLARE @MyTable TABLE ( id INT, tran_code INT, date_posted DATETIME, amount MONEY, tran_type VARCHAR(5) ) INSERT INTO @MyTable VALUES (1, 2501, '2010-2-12', '50', NULL), (2, 2501, '2010-3-12', '50', NULL), (3, 2501, '2010-4-12', '50', NULL), (4, 2502, '2010-5-12', '-50', NULL), (5, 2502, '2010-5-12', '-50', NULL), (6, 2501, '2010-6-12', '50', NULL) SELECT @ID = MIN(id) FROM @MyTable WHERE tran_code = '2502' WHILE @ID IS NOT NULL BEGIN SELECT @DatePosted = date_posted, @Amount = amount FROM @MyTable WHERE id = @ID SELECT @SubID = MAX(id) FROM @MyTable WHERE tran_code = '2501' AND amount = -(@Amount) AND date_posted < @DatePosted AND tran_type IS NULL IF @SubID IS NOT NULL BEGIN UPDATE @MyTable SET tran_type = 'Z' WHERE id IN(@ID, @SubID) END SELECT @ID = MIN(id) FROM @MyTable WHERE tran_code = '2502' AND id > @ID END SELECT * FROM @MyTable ORDER BY date_posted
Thanks,
Ram -
Friday, August 10, 2012 8:57 PM
Maybe it can be done entirely without a loop, and maybe even in a single
statement (that is, no intermediate temp tables). I am very much in doubt,
though, that it would be an efficient solution.However, loops can be written in different ways, and I assume that in your
table you have different amounts to match (or the amounts are related to
different accounts or whatever). And in that case, there is no need loop one
by one - you can match all the first 2502 in one go, all the second 2502 in
one go etc.Below is a solution to show this, and I've added some more test data. In
this solution I have materialised the row number in a separate table, since
I assume that @MyTable in real life is a permanent table. But if @MyTable
already is a temp table, you would of course added the numbers to that
table.DECLARE @MyTable TABLE
(
id INT,
tran_code INT,
date_posted DATETIME,
amount MONEY,
tran_type VARCHAR(5)
)INSERT INTO @MyTable
VALUES (1, 2501, '2010-2-12', 50, NULL),
(2, 2501, '2010-3-12', 50, NULL),
(3, 2501, '2010-4-12', 50, NULL),
(4, 2502, '2010-5-12', -50, NULL),
(5, 2502, '2010-5-12', -50, NULL),
(6, 2501, '2010-6-12', 50, NULL),
(11, 2501, '2010-2-12', 150, NULL),
(12, 2501, '2010-3-12', 150, NULL),
(13, 2502, '2010-4-12', -150, NULL),
(14, 2502, '2010-5-12', -150, NULL),
(15, 2502, '2010-5-12', -150, NULL),
(16, 2501, '2010-6-12', 150, NULL)DECLARE @numbered TABLE (id int NOT NULL PRIMARY KEY,
date_posted date NOT NULL,
amount money NOT NULL,
rowno int NOT NULL,
UNIQUE (rowno, amount)
)DECLARE @matchedpairs TABLE (id1 int NOT NULL PRIMARY KEY,
id2 int NOT NULL UNIQUE)INSERT @numbered (id, date_posted, amount,
rowno)
SELECT id, date_posted, -amount,
row_number () OVER(PARTITION BY amount ORDER BY date_posted, id)
FROM @MyTable
WHERE tran_code = 2502DECLARE @rowno int
DECLARE cur CURSOR STATIC LOCAL FOR
SELECT DISTINCT rowno FROM @numbered ORDER BY rownoOPEN cur
WHILE 1 = 1
BEGIN
FETCH cur INTO @rowno
IF @@fetch_status <> 0
BREAKINSERT @matchedpairs (id1, id2)
SELECT a.id, b.id
FROM @numbered a
CROSS APPLY (SELECT TOP 1 b.id
FROM @MyTable b
WHERE b.tran_code = 2501
AND b.amount = a.amount
AND b.date_posted <= a.date_posted
AND NOT EXISTS (SELECT *
FROM @matchedpairs mp
WHERE mp.id2 = b.id)
ORDER BY b.date_posted DESC) AS b
WHERE a.rowno = @rowno-- If we did not match anything, we can quit.
IF @@rowcount = 0
BREAK
ENDUPDATE @MyTable
SET tran_type = 'Z'
FROM @MyTable t
WHERE EXISTS (SELECT *
FROM (SELECT id1 AS id
FROM @matchedpairs
UNION ALL
SELECT id2
FROM @matchedpairs) AS u
WHERE t.id = u.id)SELECT * FROM @MyTable ORDER BY id
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se- Proposed As Answer by Naomi NMicrosoft Community Contributor, Moderator Tuesday, August 14, 2012 1:30 PM
- Marked As Answer by Kalman TothMicrosoft Community Contributor, Moderator Friday, August 17, 2012 4:51 AM
-
Wednesday, September 12, 2012 1:52 PM
Thank you for the good solution. This is the exact solution what I expected.
I have few doubts and I have listed them below.
- Is the performance better when we use temp table instead of Cursor?
- Another question is coming in my mind when I think of using temp tables. Is table variable better than temp tables in performance perspective?
- One final question, here you have used cross apply. But its little confusing to understand it's clear functionality. It will be great if you let me know the part of the CROSS JOIN here and why we use it rather than other things like JOINs?
Thanks,
Ram -
Tuesday, September 25, 2012 8:51 PM
- I'm uncertain what you mean here. My solution uses both a table variable and a cursor. The reason I use a cursor is that I think that this is more efficient than a set-based solution, at least on SQL 2008. (And not the least easier to develop. :-). The reason I use the table @numbered is that I assume that you have a bunch of these for different amounts, accounts and whatnots. And since they are independent of each other, they can be handled in breadth, by first handling all rowno = 1, then all rowno = 2 and so on. I expect this to be more efficient than matching only one pair of 2501/2502 at a time. When I think of it, rather than running a cursor over the row numbers, the loop could delete all matched rows from @numbered, and then it would be a loop wth the condition "WHILE EXISTS (SELECT * FROM @numbered). For the logic of the loop, the temp table is not needed, but it's usually better to materialise the row number and index it.
- Table variables vs. temp tables and performance is one of these "it depends" questions. Temp table has distribution statistics and therefore can cause recompilation which can give you better query plans. But if they don't, the recompilation just costs extra. One should also be aware of that table variables precludes parallelism. In the long run, temp tables usually gives better performance. But table variables gives more checks at compile time, which is not to be ignored.
- CROSS JOIN would not work here. APPLY is a special JOIN operator. In a JOIN, the two sides cannot refer to each other. But in APPLY the right side can refer to the left side. Note that the subquery refers columns in the "a" table. This is not possible with JOIN. The net effect is that all rows produced by the subquery are added to the result set. There are two APPLY operators, CROSS and OUTER. The difference is when the subquery does not return any rows. With CROSS the row from the left table is lost, with OUTER it is retained. In this case, I use CROSS, because if there is no match, there is nothing to update.
Please keep in mind that when it comes to performance questions, the answer is often "it depends", and when in doubt, you need to benchmark with your data and tables.
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

