locked
SQL Averages RRS feed

  • Question

  • Is there any way in T-SQL to calculate averages based on distinct values from another field.

    I have some data in an OLAP style database in SQL-Server 2005 that looks something like this when de-normalised :-

    Patient ID   Rx ID     Product    Rx Action   Diagnosis                   Doctor Specialty    Exit Dose
    ----------   -----     -------    ---------   ---------                   ----------------    ---------
    690529       1317203   acarbose   Repeat      Type II diabetes mellitus   Diabetes            NULL
    690529       1317203   acarbose   Repeat      Type II diabetes mellitus   Endocrinology       NULL
    703972       1347083   acarbose   Repeat      Type II diabetes mellitus   NULL                100
    707196       1353727   acarbose   Repeat      Type II diabetes mellitus   NULL                200
    714561       1370887   acarbose   Repeat      Diabetes                    Diabetes            100
    714561       1370887   acarbose   Repeat      Diabetes                    Elderly medicine    100
    714561       1370887   acarbose   Repeat      Diabetes                    Stroke              100
    714561       1370887   acarbose   Repeat      Obesity                     Diabetes            100
    714561       1370887   acarbose   Repeat      Obesity                     Elderly medicine    100
    714561       1370887   acarbose   Repeat      Obesity                     Stroke              100
    714561       1370887   acarbose   Repeat      Hypertension                Diabetes            100
    714561       1370887   acarbose   Repeat      Hypertension                Elderly medicine    100
    714561       1370887   acarbose   Repeat      Hypertension                Stroke              100
    714561       1370887   acarbose   Repeat      Ischaemic heart disease     Diabetes            100
    714561       1370887   acarbose   Repeat      Ischaemic heart disease     Elderly medicine    100
    714561       1370887   acarbose   Repeat      Ischaemic heart disease     Stroke              100
    714561       1370887   acarbose   Repeat      Stroke                      Diabetes            100
    714561       1370887   acarbose   Repeat      Stroke                      Elderly medicine    100
    714561       1370887   acarbose   Repeat      Stroke                      Stroke              100
    714561       1370887   acarbose   Repeat      Hyperlipidaemia             Diabetes            100
    714561       1370887   acarbose   Repeat      Hyperlipidaemia             Elderly medicine    100
    714561       1370887   acarbose   Repeat      Hyperlipidaemia             Stroke              100
    ...

    My application is generating dynamic T-SQL to analyse this data with the required groupings and sub-totals
    e.g. given a table T with the above data

    select 
      Product, 
      'Total Specialty' as Specialty,
      'Total Diagnosis' as Diagnosis,
      count(distinct PatientID ) as [Patient Count], 
      count( distinct RxID ) as [Rx Count], 
      Avg( ExitDose ) as [Avg. Dose]
    from T
    group by Product
    union all
    select 
      Product,
      '  ' + Coalesce( DoctorSpecialty, '-' ), 
      'Total Diagnosis' as Diagnosis, 
      count(distinct PatientID ), 
      count( distinct RxID ), 
      Avg( ExitDose ) 
    from T
    group by Product, Coalesce( DoctorSpecialty, '-' )

    which returns

    Product    Specialty            Diagnosis         Patient Count   Rx Count   Avg. Dose
    -------    ---------            ---------         -------------   --------   ---------
    acarbose   Total Specialty      Total Diagnosis   4               4          105
    acarbose     -                  Total Diagnosis   2               2          150
    acarbose     Diabetes           Total Diagnosis   2               2          100
    acarbose     Elderly medicine   Total Diagnosis   1               1          100
    acarbose     Endocrinology      Total Diagnosis   1               1          NULL
    acarbose     Stroke             Total Diagnosis   1               1          100

    All looks good? Actually no.

    The patient count and Rx count figures are correct, but the average dose figure is wrong (especially at the total speciality level - where the value should be 133.33). This is because the last 18 records are not actually different values at all, but are actually for the same prescription that is just duplicated in the data for different diagnosis / doctor speciality values - which is skewing the average calculation.

    The 133.33 figure comes from the fact that there are 4 distinct Rx ID values in the data
    1317203 - dose = null
    1347083 - dose = 100
    1353727 - dose = 200
    1370887 - dose = 100
    So what I actually need is the average of ( null, 100, 200, 100 ) = 400 / 3 = 133.33

    I know this won't be produced by the SQL query I provided - I just need to know if there is any way to do this - calculate an average value from all this data by taking only 1 row for each distinct Rx ID value.

    Friday, March 26, 2010 8:22 AM

All replies