SQL joins on Multiple columns showing results not expected

Respondido SQL joins on Multiple columns showing results not expected

  • Thursday, February 07, 2013 4:42 PM
     
     


    Table A contains Pat_id, admission date, Discharge date, Admission Type, SPecialty, Provider and other fields.

    Table B also contains Pat_id, admission date, Discharge date, Admission Type, SPecialty, Provider, total Cost, tariff and other fields.

    Table A is an extract for 2012/13 financial year and Table B is the original table with records from 2006 till date.

    I want all the rows in Table A with the first matching row matching rows in Table B i e i can do a max on cost and tariff. 

    I tried using 

    Table A.* , MAX(Table b. Cost) as COst, MAX(Table b. Tariff) as Tariff Left Outer Join Table B
         ON pat_id = pat_id AND
            admission date = admission date AND
            discharge date = admission date AND
            Admission Type = Admission Type AND
            SPecialty = SPecialty AND
            Provider = Provider 
    Group By Table A fields

    there are 40000 rows in Table A but i am getting only 38000 rows. Don't understand what is happening with it.

All Replies

  • Thursday, February 07, 2013 4:58 PM
     
     

    Duplicate rows in Table A perhaps?  Or, rather, rows that have duplicate values in all of the columns in your Group By clause.

    It might be easier for us to see the answer if you posted the actual query.

    Tom

  • Thursday, February 07, 2013 5:02 PM
     
     

    The difference of 2000 records could be the reason of they being duplicate records in Table A (all fields having same values for more than 1 record) and as you are using Group By, you won't get duplicate copies of values (on grouping fields).

    Thanks!

  • Tuesday, February 12, 2013 3:17 PM
     
     

    Thank you Tom and Deepak. I think Deepak is right. But in my case the group by clause does not eliminate the duplicates. this is where i am confused.

    Regards!

  • Tuesday, February 12, 2013 3:24 PM
     
     

    Hi Guys

    Here is the query as Tom Suggested, It is a very similar problem to the one mentioned above.

                          

    SELECT     PatientKey, LocalProvider, AttendanceType, Month, AgeAtCDSActivityDate
    into from OP1
    FROM         dbo.OPFinalWithCosts
    WHERE     (AgeBand <> 'Under 17') AND (Month IN ('Apr-12', 'May-12', 'Jun-12', 'Jul-12', 'Aug-12', 'Sep-12')) AND (LocalProvider = 'IMPERIAL') AND 
                          (AttendanceType = 'First')

     --114100 rows

    SELECT     PatientKey, LocalProvider, Month, AgeAtCDSActivityDate, AttendanceType
    into OP2
    FROM         dbo.OPBaseTableFinal
    WHERE     (AgeAtCDSActivityDate >= 17) AND (AttendanceType = 'First') AND (LocalProvider = 'IMPERIAL')
    AND (Month IN ('Apr-12', 'May-12', 'Jun-12', 'Jul-12', 'Aug-12', 'Sep-12'))

    --114080 rows

    SELECT     dbo.OP1.PatientKey, dbo.OP1.LocalProvider, dbo.OP1.AttendanceType, dbo.OP1.Month, dbo.OP1.AgeAtCDSActivityDate, 
                          dbo.OP2.PatientKey AS PatientKey2, dbo.OP2.Month AS Month2, dbo.OP2.AgeAtCDSActivityDate AS AgeAtCDSActivityDate2
    FROM         dbo.OP1 INNER JOIN
                          dbo.OP2 ON dbo.OP1.PatientKey = dbo.OP2.PatientKey AND dbo.OP1.LocalProvider = dbo.OP2.LocalProvider AND 
                          dbo.OP1.Month = dbo.OP2.Month AND dbo.OP1.AttendanceType = dbo.OP2.AttendanceType AND 
                          dbo.OP1.AgeAtCDSActivityDate = dbo.OP2.AgeAtCDSActivityDate
    GROUP BY dbo.OP1.PatientKey, dbo.OP1.LocalProvider, dbo.OP1.AttendanceType, dbo.OP1.Month, dbo.OP1.AgeAtCDSActivityDate, dbo.OP2.PatientKey, 
                          dbo.OP2.Month, dbo.OP2.AgeAtCDSActivityDate
    --101315 rows

    I am expecting atleast 114080 rows as minimum but i don't understand why am i getting less records this time.

  • Friday, February 22, 2013 12:50 AM
    Moderator
     
     

    >I am expecting atleast 114080 rows as minimum but i don't understand why am i getting less records this time.

    I don't know how can we help you. Your issue is data dependent and we don't have access to the data.

    Perhaps you can use the EXCEPT set operator to find the difference: http://msdn.microsoft.com/en-us/library/ms188055.aspx .

    Also a 3rd party tool from red gate:

    SQL Data Compare


    Kalman Toth Database & OLAP Architect
    Paperback / Kindle: Pass SQL Exam 70-461 & Job Interview: Programming SQL Server 2012

  • Wednesday, February 27, 2013 10:55 AM
     
     Answered

    Thank you very much for your reply guys. Really appreciate your input.  A quick technique i used was to do CTE with windowing function. Using Partition by PatientKey, Attendance date etc and use the row number()  to get the row number. Finally use row number=1 with the CTE would pull the unique records. I don't know whether this is the best method. But certainly it works!!!

    Regards

    Siva