locked
Looping percentages RRS feed

  • Question

  • CREATE TABLE projecthours([Resource_id] [varchar](25) NOT NULL, [Project_Id] varchar null,

    [ResourceName] [varchar](101) NULL, [Calendar_month] [int] NULL, [Calendar_Year] [int] NULL, [Project Hours] [decimal] NULL, [ProjectPercent] [decimal](38, 6) NULL) INSERT INTO projecthours([Resource_ID],[Project_ID],[ResourceName],[Calendar_month],[Calendar_year],[Project Hours],[Project Percent]) VALUES (1,1,'Test',6,2014,15,30) INSERT INTO projecthours([Resource_ID],[Project_ID],[ResourceName],[Calendar_month],[Calendar_year],[Project Hours],[Project Percent]) VALUES (1,2,'Test',6,2014,18,31) INSERT INTO projecthours([Resource_ID],[Project_ID],[ResourceName],[Calendar_month],[Calendar_year],[Project Hours],[Project Percent]) VALUES (1,3,'Test',6,2014,17,34) INSERT INTO projecthours([Resource_ID],[Project_ID],[ResourceName],[Calendar_month],[Calendar_year],[Project Hours],[Project Percent]) VALUES (1,4,'Test',6,2014,0,0)

    Hi All, I was hoping for some assistance. I am dealing with a legacy system where the column type for a percentage is set to an integer which is causing headache rounding issues.

    I have simplified the example insert script table above which shows ResourceID 1, split his time in June 2014 over 3 projects. 4 projects were assigned but he did no work on project 4 which is why the hours worked and percentage are 0

    My problem is that the percentage in the system must always add up to 100% so here is what i'm trying to do:

    Per resource ID - per month, add 1% on a loop to any project which has hours lodged against it.

    So in this case, there are 3 projects with a percentage against them, 30,31,34 - totalling 94% for the month

    And of course there is 6% missing

    I need a way of updating the project percent column on a loop, so it says right..3 lines are greater than 0, or 20 - and 6% is missing -

    I need it to loop through the 3 lines, add 1% to each until the total reaches 100% for the month - in this case the loop would go around the rows twice adding 2% total to each

    .. if there were 20 projects worked on, and 6% missing - the loop would sort them into order descending and add 1% the same way until it reached the 6th row - the total would equal 100% and it would stop

    Hope someone can assist me this ones really got me scratching my head :)

    Thanks - J


    Friday, July 11, 2014 2:14 PM

Answers

  • Your example is funky since the percentage entered for 17 hours is higher than for 18 hours. And in your example, both percent and hours are decimal....

    Anyway, here is something for you. The idea is that we compute the percentage with both integer division and float division and then we rank them on the difference, and we add 1 to the ones with the highest diffs for as many as we have to.

    CREATE TABLE projecthours([Resource_ID] [varchar](25) NOT NULL, [Project_ID] varchar null,
    
    [ResourceName] [varchar](101) NULL, [Calendar_month] [int] NULL, [Calendar_year] [int] NULL, [Project Hours] int NULL,
    [Project Percent] [decimal](38, 6) NULL)
    
    INSERT INTO projecthours([Resource_ID],[Project_ID],[ResourceName],[Calendar_month],[Calendar_year],[Project Hours],[Project Percent])
    VALUES (1,1,'Test',6,2014,15,30),
         (1,2,'Test',6,2014,18,31),
         (1,3,'Test',6,2014,17,34),
         (1,5,'Test',6,2014,71,34),
         (1,4,'Test',6,2014,0,0),
         (2,1,'Test',6,2014,20,30),
         (2,2,'Test',6,2014,20,31),
         (2,3,'Test',6,2014,20,34),
         (2,4,'Test',6,2014,20,0)
    
    ; WITH percs AS    (
         SELECT *, SUM(convert(int, [Project Hours])) OVER(PARTITION BY Resource_ID) AS totalhours,
         convert(int, 100* [Project Hours]) / SUM(convert(int, [Project Hours]))
                OVER(PARTITION BY Resource_ID) AS intperc,
         convert(float, 100* [Project Hours]) / SUM(convert(float, [Project Hours]))
                OVER(PARTITION BY Resource_ID) AS floatperc
         FROM     projecthours
         WHERE    [Project Hours] > 0
    ), diffranks AS (
         SELECT *, SUM(intperc) OVER (PARTITION BY Resource_ID) AS totalintperc,
             row_number() OVER(PARTITION BY Resource_ID ORDER BY floatperc - intperc DESC) AS diffrank
         FROM     percs
    )
    UPDATE diffranks
    SET        [Project Percent] = intperc + CASE WHEN diffrank <= 100 - totalintperc THEN 1 ELSE 0 END
    
    SELECT * FROM projecthours
    go
    DROP TABLE projecthours

    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by James OHara Sunday, July 13, 2014 12:29 PM
    Saturday, July 12, 2014 9:26 AM
  • Please check simple solution, If I understand your need then this is what you are looking:

    First I just used select for the example, and understanding. Please go over all the columns in this result and understand what i get there and what is the logic for fixing the [Project Percent]! If you need help in understand the logic please ank and don't ue it before :-)

    select *, [Project Percent] + CASE when [Project Percent] > 0 then FLOOR (Fix / TotalRowNumber) else 0 end + CASE when RN <= (Fix % TotalRowNumber) then 1 else 0 end
    from (
    	SELECT 
    		Resource_ID,Project_ID,ResourceName,Calendar_month,Calendar_year,[Project Hours],[Project Percent]
    		, ROW_NUMBER() over (partition by Resource_ID order by [Project Hours] desc) RN
    		, SUM([Project Percent]) over (partition by Resource_ID) TotalPercent
    		, COUNT(case when [Project Percent] > 0 then 1 end)  over (partition by Resource_ID) TotalRowNumber
    		, 100 - SUM([Project Percent]) over (partition by Resource_ID) as Fix
    		, 0 as FixLevel
    	FROM projecthours
    ) T

    Now we can use UPDATE using "merge" or using "update...from" for example:

    ;With MyCTE as(
    	select Resource_ID,Project_ID,ResourceName,Calendar_month,Calendar_year,[Project Hours], [Project Percent] = [Project Percent] + CASE when [Project Percent] > 0 then FLOOR (Fix / TotalRowNumber) else 0 end + CASE when RN <= (Fix % TotalRowNumber) then 1 else 0 end
    	from (
    		SELECT 
    			Resource_ID,Project_ID,ResourceName,Calendar_month,Calendar_year,[Project Hours],[Project Percent]
    			, ROW_NUMBER() over (partition by Resource_ID order by [Project Hours] desc) RN
    			, SUM([Project Percent]) over (partition by Resource_ID) TotalPercent
    			, COUNT(case when [Project Percent] > 0 then 1 end)  over (partition by Resource_ID) TotalRowNumber
    			, 100 - SUM([Project Percent]) over (partition by Resource_ID) as Fix
    			, 0 as FixLevel
    		FROM projecthours
    	) T
    )
    -- select * from MyCTE
    UPDATE projecthours set projecthours.[Project Percent] = MyCTE.[Project Percent]
    from projecthours
    inner join MyCTE
    -- I dont see any unique column so i need to use all column for compare! u need unique column like ID
    __ and then this query will be fast and simpler on projecthours.Resource_ID = MyCTE.Resource_ID and projecthours.Project_ID = MyCTE.Project_ID and projecthours.ResourceName = MyCTE.ResourceName and projecthours.Calendar_month = MyCTE.Calendar_month and projecthours.Calendar_year = MyCTE.Calendar_year and projecthours.[Project Hours] = MyCTE.[Project Hours] GO

    I hope this is what you are looking for :-)


    [Personal Site] [Blog] [Facebook]signature

    • Edited by pituachMVP Sunday, July 13, 2014 8:31 AM
    • Marked as answer by James OHara Sunday, July 13, 2014 12:28 PM
    Sunday, July 13, 2014 8:28 AM
  • This is the DDL+DML that I used for my testing (this is your DDL+DML with some more sample data):

    CREATE TABLE projecthours(
    	[Resource_ID] [varchar](25) NOT NULL, 
    	[Project_ID] varchar null,
    	[ResourceName] [varchar](101) NULL, 
    	[Calendar_month] [int] NULL, 
    	[Calendar_year] [int] NULL, 
    	[Project Hours] int NULL,
    	[Project Percent] [decimal](38, 6) NULL
    )
    
    truncate table projecthours
    INSERT INTO projecthours([Resource_ID],[Project_ID],[ResourceName],[Calendar_month],[Calendar_year],[Project Hours],[Project Percent])
    VALUES 
    	(1,1,'Test',6,2014,15,30),
    	(1,2,'Test',6,2014,18,31),
    	(1,3,'Test',6,2014,17,34),
    	(1,4,'Test',6,2014,0,0)
    GO
    
    INSERT INTO projecthours([Resource_ID],[Project_ID],[ResourceName],[Calendar_month],[Calendar_year],[Project Hours],[Project Percent])
    VALUES 
    	(2,1,'Test',6,2014,15,30),
    	(2,2,'Test',6,2014,18,31),
    	(2,3,'Test',6,2014,17,34),
    	(2,4,'Test',6,2014,0,0)
    GO



    [Personal Site] [Blog] [Facebook]signature

    • Marked as answer by James OHara Sunday, July 13, 2014 12:29 PM
    Sunday, July 13, 2014 8:33 AM
  • just to clarify the "fix" column always seems to calculate the right value, but doesn't seem to apply it across the rows in the update

    I'm not sure that this clarification helps me at all. Or rather I am sure that it does not help me, because I don't understand what you are talking about.

    If it is not as simple as changing the PARTITION BY clause as I guess in my previous post, please supply CREATE TABLE + INSERT statements with enough sample data to demonstrate the problem, and of course the expected result given the sample.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by James OHara Sunday, July 13, 2014 12:28 PM
    Sunday, July 13, 2014 12:03 PM

All replies

  • Hi James,

    I will avoid giving a technical solution as someone more adept than I on this forum would provide a better solution than I can (I will give it a crack if no-one responds, although this forum is probably one of the more active ones).

    However, can I ask something about the numbers you mentioned: you say a rounding issue has caused this problem but for 3 projects (or even if we considered the fourth), the sum of any values left out by rounding would be limited by 2% (4 projects multiplied by 0.5%). So I cannot see how 6% would be missing without time being reported elsewhere?

    Friday, July 11, 2014 3:34 PM
  • >I need it to loop through the 3 lines, add 1% to each until the total reaches 100% for the month - in this case the loop would go around the rows twice adding 2% total to each

    You don't need a loop for that. Just do an update: add 6/3=2 to current values.

    In case you are still off with one percent, update any one of the rows the same way.

    You can also calculate the figures in the query without changing the data.

    Percent on base calculation: http://www.sqlusa.com/bestpractices/percentonbase/


    Kalman Toth Database & OLAP Architect SQL Server 2014 Design & Programming
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012











    Friday, July 11, 2014 3:40 PM
  • This sounds wrong on so many levels...  I can't imagine people will be happy at changing numbers around...  But just a thought, since you asked:  Instead of a loop, something along the lines of (pseudosql)...

    1. Select sum(percents) from PEOPLE... group by ... Having SUm(Percents) < 100

    2. Then "Update TABLE set percent = percent + 1 where Person in
                   (Select top (100 - the number from step 1) from TABLE) order by percent "

    With a CTE or a few nested queries, you could do all that in one somewhat complex statement.  I still doubt that you should, but you could... :-)


    Friday, July 11, 2014 3:43 PM
  • Hi guys

    Thought i'd come on and clarify a few things,

    The example I used was simplified, the reason it can be 6% out,is that some userID's have worked on 40 or 50 projects and when a significant number of those rounds down or up the wrong way you end up with the wrong number

    The second problem is thaty this is a crazy legacy system which is 20 years old and was built on MS Access then migrated to SQL 2000, then to 2005, - Basically 1000's of people use the database and changing the value to a decimal is not possible, as many apps and things hang off it that use the metadata, apps that the company doesn't have the source code for - so its a bit of a pickle -

    As for being wrong, trust me we know! every so often we must do something we're asked without questioning :) hey it sucks but that's where we are lol

    If anyone knows how to do this loop i'd be really appreciative - you can't do it as a calculation like 6/3=2 each because the number of rows might not divide properly, it needs to sort the rows into order then keep adding 1 to each until it reaches its target. the idea is to have all of them slightly out and equitably split the 'pain' between each project if that makes sense

    like we can't just add the 6 to one project either because it would put the resource massively out, we'd rather have each project within 1 or 2 %


    • Edited by James OHara Saturday, July 12, 2014 8:25 AM
    Saturday, July 12, 2014 8:24 AM
  • Your example is funky since the percentage entered for 17 hours is higher than for 18 hours. And in your example, both percent and hours are decimal....

    Anyway, here is something for you. The idea is that we compute the percentage with both integer division and float division and then we rank them on the difference, and we add 1 to the ones with the highest diffs for as many as we have to.

    CREATE TABLE projecthours([Resource_ID] [varchar](25) NOT NULL, [Project_ID] varchar null,
    
    [ResourceName] [varchar](101) NULL, [Calendar_month] [int] NULL, [Calendar_year] [int] NULL, [Project Hours] int NULL,
    [Project Percent] [decimal](38, 6) NULL)
    
    INSERT INTO projecthours([Resource_ID],[Project_ID],[ResourceName],[Calendar_month],[Calendar_year],[Project Hours],[Project Percent])
    VALUES (1,1,'Test',6,2014,15,30),
         (1,2,'Test',6,2014,18,31),
         (1,3,'Test',6,2014,17,34),
         (1,5,'Test',6,2014,71,34),
         (1,4,'Test',6,2014,0,0),
         (2,1,'Test',6,2014,20,30),
         (2,2,'Test',6,2014,20,31),
         (2,3,'Test',6,2014,20,34),
         (2,4,'Test',6,2014,20,0)
    
    ; WITH percs AS    (
         SELECT *, SUM(convert(int, [Project Hours])) OVER(PARTITION BY Resource_ID) AS totalhours,
         convert(int, 100* [Project Hours]) / SUM(convert(int, [Project Hours]))
                OVER(PARTITION BY Resource_ID) AS intperc,
         convert(float, 100* [Project Hours]) / SUM(convert(float, [Project Hours]))
                OVER(PARTITION BY Resource_ID) AS floatperc
         FROM     projecthours
         WHERE    [Project Hours] > 0
    ), diffranks AS (
         SELECT *, SUM(intperc) OVER (PARTITION BY Resource_ID) AS totalintperc,
             row_number() OVER(PARTITION BY Resource_ID ORDER BY floatperc - intperc DESC) AS diffrank
         FROM     percs
    )
    UPDATE diffranks
    SET        [Project Percent] = intperc + CASE WHEN diffrank <= 100 - totalintperc THEN 1 ELSE 0 END
    
    SELECT * FROM projecthours
    go
    DROP TABLE projecthours

    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by James OHara Sunday, July 13, 2014 12:29 PM
    Saturday, July 12, 2014 9:26 AM
  • Hi Erland

    I really appreciate your detailed reply, I gaveit a shot and my apologies for the skewy results, I just didn't want

    I cant believe the quality of this answer, I fully understand the code written and how you figured this out - amazing answer

    ive got one tiny problem remaining, if I restrict the query to one month only all the projects come out to 100%, but if the same resource has an entry over 2 months, it seems to do say 49 and 51 or 62 and 38 over the 2 months.

    How do I add that final split when it does the update so the months get separated?

    thanks a lot - and wow what a brilliant answer you gave

    example results are:

    Resource_ID, Month, Year, Percent

    9090|6|2014|100
    9121|5|2014|49
    9121|6|2014|51

    also I have a few which were rounded up too far -

    9555|5|2014|103

    Wouldthis round downward the same way if the total was over 100?

    So resource 9090 only did say June, but 9121 did stuff in may and june and the result got split - 49 and 51 instead of being 100 for each, each row appears to have been incorrectly manipulated as this is the aggregated total

    J



    Sunday, July 13, 2014 1:27 AM
  • Please check simple solution, If I understand your need then this is what you are looking:

    First I just used select for the example, and understanding. Please go over all the columns in this result and understand what i get there and what is the logic for fixing the [Project Percent]! If you need help in understand the logic please ank and don't ue it before :-)

    select *, [Project Percent] + CASE when [Project Percent] > 0 then FLOOR (Fix / TotalRowNumber) else 0 end + CASE when RN <= (Fix % TotalRowNumber) then 1 else 0 end
    from (
    	SELECT 
    		Resource_ID,Project_ID,ResourceName,Calendar_month,Calendar_year,[Project Hours],[Project Percent]
    		, ROW_NUMBER() over (partition by Resource_ID order by [Project Hours] desc) RN
    		, SUM([Project Percent]) over (partition by Resource_ID) TotalPercent
    		, COUNT(case when [Project Percent] > 0 then 1 end)  over (partition by Resource_ID) TotalRowNumber
    		, 100 - SUM([Project Percent]) over (partition by Resource_ID) as Fix
    		, 0 as FixLevel
    	FROM projecthours
    ) T

    Now we can use UPDATE using "merge" or using "update...from" for example:

    ;With MyCTE as(
    	select Resource_ID,Project_ID,ResourceName,Calendar_month,Calendar_year,[Project Hours], [Project Percent] = [Project Percent] + CASE when [Project Percent] > 0 then FLOOR (Fix / TotalRowNumber) else 0 end + CASE when RN <= (Fix % TotalRowNumber) then 1 else 0 end
    	from (
    		SELECT 
    			Resource_ID,Project_ID,ResourceName,Calendar_month,Calendar_year,[Project Hours],[Project Percent]
    			, ROW_NUMBER() over (partition by Resource_ID order by [Project Hours] desc) RN
    			, SUM([Project Percent]) over (partition by Resource_ID) TotalPercent
    			, COUNT(case when [Project Percent] > 0 then 1 end)  over (partition by Resource_ID) TotalRowNumber
    			, 100 - SUM([Project Percent]) over (partition by Resource_ID) as Fix
    			, 0 as FixLevel
    		FROM projecthours
    	) T
    )
    -- select * from MyCTE
    UPDATE projecthours set projecthours.[Project Percent] = MyCTE.[Project Percent]
    from projecthours
    inner join MyCTE
    -- I dont see any unique column so i need to use all column for compare! u need unique column like ID
    __ and then this query will be fast and simpler on projecthours.Resource_ID = MyCTE.Resource_ID and projecthours.Project_ID = MyCTE.Project_ID and projecthours.ResourceName = MyCTE.ResourceName and projecthours.Calendar_month = MyCTE.Calendar_month and projecthours.Calendar_year = MyCTE.Calendar_year and projecthours.[Project Hours] = MyCTE.[Project Hours] GO

    I hope this is what you are looking for :-)


    [Personal Site] [Blog] [Facebook]signature

    • Edited by pituachMVP Sunday, July 13, 2014 8:31 AM
    • Marked as answer by James OHara Sunday, July 13, 2014 12:28 PM
    Sunday, July 13, 2014 8:28 AM
  • This is the DDL+DML that I used for my testing (this is your DDL+DML with some more sample data):

    CREATE TABLE projecthours(
    	[Resource_ID] [varchar](25) NOT NULL, 
    	[Project_ID] varchar null,
    	[ResourceName] [varchar](101) NULL, 
    	[Calendar_month] [int] NULL, 
    	[Calendar_year] [int] NULL, 
    	[Project Hours] int NULL,
    	[Project Percent] [decimal](38, 6) NULL
    )
    
    truncate table projecthours
    INSERT INTO projecthours([Resource_ID],[Project_ID],[ResourceName],[Calendar_month],[Calendar_year],[Project Hours],[Project Percent])
    VALUES 
    	(1,1,'Test',6,2014,15,30),
    	(1,2,'Test',6,2014,18,31),
    	(1,3,'Test',6,2014,17,34),
    	(1,4,'Test',6,2014,0,0)
    GO
    
    INSERT INTO projecthours([Resource_ID],[Project_ID],[ResourceName],[Calendar_month],[Calendar_year],[Project Hours],[Project Percent])
    VALUES 
    	(2,1,'Test',6,2014,15,30),
    	(2,2,'Test',6,2014,18,31),
    	(2,3,'Test',6,2014,17,34),
    	(2,4,'Test',6,2014,0,0)
    GO



    [Personal Site] [Blog] [Facebook]signature

    • Marked as answer by James OHara Sunday, July 13, 2014 12:29 PM
    Sunday, July 13, 2014 8:33 AM
  • ive got one tiny problem remaining, if I restrict the query to one month only all the projects come out to 100%, but if the same resource has an entry over 2 months, it seems to do say 49 and 51 or 62 and 38 over the 2 months.

    How do I add that final split when it does the update so the months get separated?

    I'm not sure that I understand, but if I am to guess you want a total 100% per month (that would make sense). In such case, replace all occurrences of

    PARTITION BY Resource_ID

    With

    PARTITION BY Resource_ID, Calendar_year, Calendar_month

    Generally, when you have a problem of this kind, you should include enough sample data to illustrate the problem. I threw in a PARTITION BY Resource_ID as a hint, but it was partially a guess. Yeah, maybe I should have figured out the part about year and month myself, but I did not look too closely at the column names.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Sunday, July 13, 2014 10:22 AM
  • just to clarify the "fix" column always seems to calculate the right value, but doesn't seem to apply it across the rows in the update
    Sunday, July 13, 2014 11:20 AM
  • just to clarify the "fix" column always seems to calculate the right value, but doesn't seem to apply it across the rows in the update

    I'm not sure that this clarification helps me at all. Or rather I am sure that it does not help me, because I don't understand what you are talking about.

    If it is not as simple as changing the PARTITION BY clause as I guess in my previous post, please supply CREATE TABLE + INSERT statements with enough sample data to demonstrate the problem, and of course the expected result given the sample.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by James OHara Sunday, July 13, 2014 12:28 PM
    Sunday, July 13, 2014 12:03 PM
  • that fixed it! thankyou guys -- I really appreciate your help!

    it works it works :D :D :D :D I love you guys

    TY

    Sunday, July 13, 2014 12:30 PM
  • that fixed it! thankyou guys -- I really appreciate your help!

    it works it works :D :D :D :D I love you guys

    TY

    You are most welcome :-)

    We are here just for thanks,
    It is the fuel that feed the support's engine :-)


    [Personal Site] [Blog] [Facebook]signature

    Sunday, July 13, 2014 1:34 PM