locked
DAX: Sum of Averages RRS feed

  • Question

  • I have a column "scheduled hours" which is basically the number of hours an employee is scheduled to work each day. This data is repeated for each record in the table for the employee and day as shown below. 

    What's the most optimal way to do a SUM of AVERAGES in this case?

    The table contains close 30 million records and will grow. I tried SUMX but it does seem to be a little slow. 

    Scheduled Hrs:=SUMX(VALUES('Time'[Employee Day ID]),CALCULATE(AVERAGE(Time[Scheduled Hours])

    Here's a simple example of what the data looks like and what is needed. 

    Friday, September 15, 2017 8:47 PM

All replies

  • Hi VivDev,

    Thanks for your question.

    If I understand you correctly, you can just try below simple DAX formula:
    Scheduled Hrs:= SUM(Time[Scheduled Hours])


    Best Regards
    Willson Yuan
    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Monday, September 18, 2017 5:40 AM
  • This is actually a design/granularity problem. Your scheduled hours data is keyed by (EmpID,Date), while your first table is keyed by (EmpID,Date,Record#), which leads to scheduled hours explosion over the extra rows.

    So I suggest you first investigate whether it is possible to move scheduled hours to a table with the appropriate granularity (your step 2). Then you'd just do a plain sum over it.

    If not, AVERAGE is probably not the best choice here, because it forces the engine to first sum N equal values, calculate that N, and finally divide the sum by N. I'd try firstnonblank or max or min instead of average.


    Expect me to help you solve your problems, not to solve your problems for you.

    Monday, September 18, 2017 9:11 AM
  • This is actually a design/granularity problem. Your scheduled hours data is keyed by (EmpID,Date), while your first table is keyed by (EmpID,Date,Record#), which leads to scheduled hours explosion over the extra rows.

    So I suggest you first investigate whether it is possible to move scheduled hours to a table with the appropriate granularity (your step 2). Then you'd just do a plain sum over it.

    If not, AVERAGE is probably not the best choice here, because it forces the engine to first sum N equal values, calculate that N, and finally divide the sum by N. I'd try firstnonblank or max or min instead of average.


    Expect me to help you solve your problems, not to solve your problems for you.

    Yes. Agreed. I figured I should probably be breaking them out but wanted to see if it was easier to have the data in the same table and handle it via the measure. 

    I have now created a separate table for the scheduled hours but confused as to how to build the relationships because it's like now I have 2 fact tables. One with all the hours entered and one with hours scheduled.

    I have setup the relationships as shown in the model below and here are a couple of measures 

    Scheduled Hrs:=CALCULATE(SUM('Employee Schedule'[Scheduled Hours]),'Time') 

    Billable Hrs:=CALCULATE(SUM([Hrs Entered],Time[Time Bucket]="Billable")

    Billable Hrs is super fast but Scheduled Hrs is relatively much slower. 

    The Time table does have 25M+ records and growing but would love to have Scheduled Hours perform the same as Billable hours because it's used in pretty much every other measure making all of them slower.

    Wondering if I need almost a complete overhaul of design here. 

    Monday, September 18, 2017 8:03 PM
  • The relationship between Employee Schedule and Time is inactive, so why do you use Time as a filter in the Scheduled Hrs measure? Actually, why create this relationship at all? You have your calendar and employee (dimension) tables linked to both of these (fact) tables, and this certainly looks as much as is needed for this data model.

    Expect me to help you solve your problems, not to solve your problems for you.

    Tuesday, September 19, 2017 7:25 AM
  • The relationship between Employee Schedule and Time is inactive, so why do you use Time as a filter in the Scheduled Hrs measure? Actually, why create this relationship at all? You have your calendar and employee (dimension) tables linked to both of these (fact) tables, and this certainly looks as much as is needed for this data model.

    Expect me to help you solve your problems, not to solve your problems for you.

    The inactive relationship can be ignored. I was just lazy not to remove that lest I need it again. 

    I have created an updated/simplified power pivot model with some test data. Sample Model

    The problem I'm facing now is I need Cost center to propagate to the Time table or the measures based on the Time table.

    e.g. If I filter by a Cost Center then I need the scheduled Hours of all employees that were in that cost center along with the number of hours they entered. 

    If I convert my measure from 

    Hours Entered:=SUM('Time'[Hours]) to Hours Entered:=CALCULATE(SUM('Time'[Hours]), Schedule) then it filters the Time table by the cost center but it does not show any hours entered on days when there was no schedule. 

    Right now I'm struggling to figure out how to solve for both the cases. 


    Wednesday, September 20, 2017 5:32 PM
  • One thing immediately apparent is that there's a Cost Center ID in the Time table, so why not just add a relationship to the CostCenter table?

    One less apparent thing is that there's a Cost Center field in the Employee table, hinting at a possible Facts->Employee->Cost Center relationship.

    And, I still don't understand why you have originally used Time as a filter in your measure which you complained was slow.


    Expect me to help you solve your problems, not to solve your problems for you.

    Thursday, September 21, 2017 7:13 AM
  • One thing immediately apparent is that there's a Cost Center ID in the Time table, so why not just add a relationship to the CostCenter table?

    One less apparent thing is that there's a Cost Center field in the Employee table, hinting at a possible Facts->Employee->Cost Center relationship.

    And, I still don't understand why you have originally used Time as a filter in your measure which you complained was slow.


    Expect me to help you solve your problems, not to solve your problems for you.

    Ignore the Cost Center ID in the Time table, it was supposed to be Project ID. 

    Employees Table -> List of all Employees with their current cost center , country , employee type information and Schedule will provide 

    Schedule -> Number of hours Employees is expected to work on any given day with their point in time cost center, country, employee type etc. 

    Time -> Time entered on any given day on any project. A person may enter time on a day he/she is not scheduled to work as well. 

    Cost Center -> Cost Center Master table (actually has other fields too but I've kept it simple in the example)

    When I say create a pivot and filter it but a cost center and say by a week, I want to see a list of all employees that belonged to that Cost Center in that week (based on schedule data) and get me all the time these employees entered in that week (including any time on non-scheduled days)

    Another possibility is, I may want to filter by Employee Cost center in which case I expect to see time entered by all employees that belong to that cost center currently

    Hope my explanation is clear enough. 



    Friday, September 22, 2017 3:56 PM
  • Hi,

    Design questions,please ring microsoft or buy consultant services.

    Sunday, September 24, 2017 3:55 AM
  • So conceptually Cost Center is a slowly changing attribute of the Employee dimension, and you need reports for both current and historical cost centers. I think the easiest way for you to achieve this is to take the logic that saves the current cost center to the Schedule table and apply it to the Time table. And, some reading on implementing SCDs with DAX seems in order.

    Expect me to help you solve your problems, not to solve your problems for you.

    Monday, September 25, 2017 9:30 AM
  • So conceptually Cost Center is a slowly changing attribute of the Employee dimension, and you need reports for both current and historical cost centers. I think the easiest way for you to achieve this is to take the logic that saves the current cost center to the Schedule table and apply it to the Time table. And, some reading on implementing SCDs with DAX seems in order.

    Expect me to help you solve your problems, not to solve your problems for you.

    Thanks for your inputs and guidance. Will read about SCDs and figure it out. 
    Tuesday, September 26, 2017 12:24 AM
  • I think I found what you need, please refer to the following post:

    http://sqlkover.com/calculate-a-semi-additive-average-in-dax/

    Regards,

    Oscar


    • Edited by Valerockero Wednesday, December 27, 2017 2:06 PM
    Wednesday, December 27, 2017 2:06 PM