Asked by:
SQL Averages

Question
-
Is there any way in T-SQL to calculate averages based on distinct values from another field.
I have some data in an OLAP style database in SQL-Server 2005 that looks something like this when de-normalised :-
Patient ID Rx ID Product Rx Action Diagnosis Doctor Specialty Exit Dose ---------- ----- ------- --------- --------- ---------------- --------- 690529 1317203 acarbose Repeat Type II diabetes mellitus Diabetes NULL 690529 1317203 acarbose Repeat Type II diabetes mellitus Endocrinology NULL 703972 1347083 acarbose Repeat Type II diabetes mellitus NULL 100 707196 1353727 acarbose Repeat Type II diabetes mellitus NULL 200 714561 1370887 acarbose Repeat Diabetes Diabetes 100 714561 1370887 acarbose Repeat Diabetes Elderly medicine 100 714561 1370887 acarbose Repeat Diabetes Stroke 100 714561 1370887 acarbose Repeat Obesity Diabetes 100 714561 1370887 acarbose Repeat Obesity Elderly medicine 100 714561 1370887 acarbose Repeat Obesity Stroke 100 714561 1370887 acarbose Repeat Hypertension Diabetes 100 714561 1370887 acarbose Repeat Hypertension Elderly medicine 100 714561 1370887 acarbose Repeat Hypertension Stroke 100 714561 1370887 acarbose Repeat Ischaemic heart disease Diabetes 100 714561 1370887 acarbose Repeat Ischaemic heart disease Elderly medicine 100 714561 1370887 acarbose Repeat Ischaemic heart disease Stroke 100 714561 1370887 acarbose Repeat Stroke Diabetes 100 714561 1370887 acarbose Repeat Stroke Elderly medicine 100 714561 1370887 acarbose Repeat Stroke Stroke 100 714561 1370887 acarbose Repeat Hyperlipidaemia Diabetes 100 714561 1370887 acarbose Repeat Hyperlipidaemia Elderly medicine 100 714561 1370887 acarbose Repeat Hyperlipidaemia Stroke 100 ...
My application is generating dynamic T-SQL to analyse this data with the required groupings and sub-totals
e.g. given a table T with the above dataselect Product, 'Total Specialty' as Specialty, 'Total Diagnosis' as Diagnosis, count(distinct PatientID ) as [Patient Count], count( distinct RxID ) as [Rx Count], Avg( ExitDose ) as [Avg. Dose] from T group by Product union all select Product, ' ' + Coalesce( DoctorSpecialty, '-' ), 'Total Diagnosis' as Diagnosis, count(distinct PatientID ), count( distinct RxID ), Avg( ExitDose ) from T group by Product, Coalesce( DoctorSpecialty, '-' )
which returns
Product Specialty Diagnosis Patient Count Rx Count Avg. Dose ------- --------- --------- ------------- -------- --------- acarbose Total Specialty Total Diagnosis 4 4 105 acarbose - Total Diagnosis 2 2 150 acarbose Diabetes Total Diagnosis 2 2 100 acarbose Elderly medicine Total Diagnosis 1 1 100 acarbose Endocrinology Total Diagnosis 1 1 NULL acarbose Stroke Total Diagnosis 1 1 100
All looks good? Actually no.
The patient count and Rx count figures are correct, but the average dose figure is wrong (especially at the total speciality level - where the value should be 133.33). This is because the last 18 records are not actually different values at all, but are actually for the same prescription that is just duplicated in the data for different diagnosis / doctor speciality values - which is skewing the average calculation.
The 133.33 figure comes from the fact that there are 4 distinct Rx ID values in the data
1317203 - dose = null
1347083 - dose = 100
1353727 - dose = 200
1370887 - dose = 100
So what I actually need is the average of ( null, 100, 200, 100 ) = 400 / 3 = 133.33I know this won't be produced by the SQL query I provided - I just need to know if there is any way to do this - calculate an average value from all this data by taking only 1 row for each distinct Rx ID value.
Friday, March 26, 2010 8:22 AM
All replies
-
try something like this select aver=avg(columname)over(partition by rxid)Friday, March 26, 2010 8:25 AM
-
Can someone assist?
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAMWednesday, March 31, 2010 5:25 PM