SQL joins on Multiple columns showing results not expected
-
Thursday, February 07, 2013 4:42 PM
Table A contains Pat_id, admission date, Discharge date, Admission Type, SPecialty, Provider and other fields.
Table B also contains Pat_id, admission date, Discharge date, Admission Type, SPecialty, Provider, total Cost, tariff and other fields.
Table A is an extract for 2012/13 financial year and Table B is the original table with records from 2006 till date.
I want all the rows in Table A with the first matching row matching rows in Table B i e i can do a max on cost and tariff.
I tried using
Table A.* , MAX(Table b. Cost) as COst, MAX(Table b. Tariff) as Tariff Left Outer Join Table B
ON pat_id = pat_id AND
admission date = admission date AND
discharge date = admission date AND
Admission Type = Admission Type AND
SPecialty = SPecialty AND
Provider = Provider
Group By Table A fields
there are 40000 rows in Table A but i am getting only 38000 rows. Don't understand what is happening with it.
All Replies
-
Thursday, February 07, 2013 4:58 PM
Duplicate rows in Table A perhaps? Or, rather, rows that have duplicate values in all of the columns in your Group By clause.
It might be easier for us to see the answer if you posted the actual query.
Tom
- Proposed As Answer by SQL Novice 01 Thursday, February 07, 2013 5:01 PM
- Unproposed As Answer by Kalman TothMicrosoft Community Contributor, Moderator Friday, February 22, 2013 12:47 AM
-
Thursday, February 07, 2013 5:02 PM
The difference of 2000 records could be the reason of they being duplicate records in Table A (all fields having same values for more than 1 record) and as you are using Group By, you won't get duplicate copies of values (on grouping fields).
Thanks!
-
Tuesday, February 12, 2013 3:17 PM
Thank you Tom and Deepak. I think Deepak is right. But in my case the group by clause does not eliminate the duplicates. this is where i am confused.
Regards!
-
Tuesday, February 12, 2013 3:24 PM
Hi Guys
Here is the query as Tom Suggested, It is a very similar problem to the one mentioned above.
SELECT PatientKey, LocalProvider, AttendanceType, Month, AgeAtCDSActivityDate
into from OP1
FROM dbo.OPFinalWithCosts
WHERE (AgeBand <> 'Under 17') AND (Month IN ('Apr-12', 'May-12', 'Jun-12', 'Jul-12', 'Aug-12', 'Sep-12')) AND (LocalProvider = 'IMPERIAL') AND
(AttendanceType = 'First')
--114100 rows
SELECT PatientKey, LocalProvider, Month, AgeAtCDSActivityDate, AttendanceType
into OP2
FROM dbo.OPBaseTableFinal
WHERE (AgeAtCDSActivityDate >= 17) AND (AttendanceType = 'First') AND (LocalProvider = 'IMPERIAL')
AND (Month IN ('Apr-12', 'May-12', 'Jun-12', 'Jul-12', 'Aug-12', 'Sep-12'))
--114080 rows
SELECT dbo.OP1.PatientKey, dbo.OP1.LocalProvider, dbo.OP1.AttendanceType, dbo.OP1.Month, dbo.OP1.AgeAtCDSActivityDate,
dbo.OP2.PatientKey AS PatientKey2, dbo.OP2.Month AS Month2, dbo.OP2.AgeAtCDSActivityDate AS AgeAtCDSActivityDate2
FROM dbo.OP1 INNER JOIN
dbo.OP2 ON dbo.OP1.PatientKey = dbo.OP2.PatientKey AND dbo.OP1.LocalProvider = dbo.OP2.LocalProvider AND
dbo.OP1.Month = dbo.OP2.Month AND dbo.OP1.AttendanceType = dbo.OP2.AttendanceType AND
dbo.OP1.AgeAtCDSActivityDate = dbo.OP2.AgeAtCDSActivityDate
GROUP BY dbo.OP1.PatientKey, dbo.OP1.LocalProvider, dbo.OP1.AttendanceType, dbo.OP1.Month, dbo.OP1.AgeAtCDSActivityDate, dbo.OP2.PatientKey,
dbo.OP2.Month, dbo.OP2.AgeAtCDSActivityDate
--101315 rowsI am expecting atleast 114080 rows as minimum but i don't understand why am i getting less records this time.
-
Friday, February 22, 2013 12:50 AMModerator
>I am expecting atleast 114080 rows as minimum but i don't understand why am i getting less records this time.
I don't know how can we help you. Your issue is data dependent and we don't have access to the data.
Perhaps you can use the EXCEPT set operator to find the difference: http://msdn.microsoft.com/en-us/library/ms188055.aspx .
Also a 3rd party tool from red gate:
Kalman Toth Database & OLAP Architect
Paperback / Kindle: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012 -
Wednesday, February 27, 2013 10:55 AM
Thank you very much for your reply guys. Really appreciate your input. A quick technique i used was to do CTE with windowing function. Using Partition by PatientKey, Attendance date etc and use the row number() to get the row number. Finally use row number=1 with the CTE would pull the unique records. I don't know whether this is the best method. But certainly it works!!!
Regards
Siva
- Marked As Answer by Iric WenModerator Tuesday, March 05, 2013 1:41 AM

