locked
SQL Update column based on count RRS feed

  • Question

  • User444756084 posted

    I have a SQL Server database that I am utilizing in my .NET MVC application. 

    I need to loop through a table to get the count of each value in the column 'RevID'. Each value needs to end up in the table the same number of times (ok if they are a few off). So, if I have 100 records and value 'A' is in the table 30 times, value 'B' is in the table '15' times, and value 'C' is in the table '40' times - they all need to be in the table 33 times (100 records / 3 different values).

    I do not know how to approach this. I am thinking a while loop with an update statement. Would it be productive to do the updating in C# or stored procedure. Any ideas of how to set this up? 

    I was working with this but I know it is not correct:. Any input on how this should be addressed is GREATLY appreciated. I am NOT an SQL developer so please be nice ;)

    DECLARE @i INT = 1,
            @RevCnt INT = SELECT RevId, COUNT(RevId) FROM MyTable group by RevId
    
    WHILE(@RevCnt >= 50)
    BEGIN
        UPDATE MyTable 
        SET RevID= (SELECT COUNT(RevID) FROM MyTable) 
        WHERE RevID < 50)
    
        @i = @i + 1       
    END

    Thursday, August 18, 2016 7:52 PM

Answers

  • User-595703101 posted

    Hello Gordon,

    Your requirement is very interesting for me.

    I have not worked on such a case, I will provide a solution for this. I'm not sure if it is the best one but the solution seems to be working quite well according to me.

    I'll soon try to enhance the solution at Update Table Data for Uniform Distribution in SQL of course if the uniform distribution is the correct name on this case

    Please check below SQL Update CTE command

    ;with summary as (
    	SELECT distinct 
    		COUNT(*) Over (Partition By 1) Cnt, 
    		RevId, 
    		COUNT(RevId) Over (Partition By RevId) RevCnt
    	FROM MyTable
    ), todo as (
    	select
    		Cnt, 
    		Cnt / (count(*) over (partition by 1)) Average,
    		RevId, RevCnt
    	from summary
    ), joint as (
    select
    	id,
    	MyTable.RevId,
    	orderno = ROW_NUMBER() over (partition by MyTable.RevId order by id),
    	Cnt,
    	Average,
    	RevCnt,
    	RevCnt - Average as forupdate
    from MyTable
    inner join todo on MyTable.RevId = todo.RevId
    ), upd as (
    select * 
    from joint
    where RevCnt < Average and orderno <= (forupdate * -1)
    union all
    select * 
    from joint
    where RevCnt > Average and orderno <= forupdate
    ), final as (
    select 
    	id, RevId,
    	case when forupdate > 0 then 1 else -1 end as overAvg,
    	rn = ROW_NUMBER() over (partition by (case when forupdate > 0 then 1 else -1 end) order by id)
    from upd
    )
    update MyTable
    set
    	RevId = ISNULL(n.RevId,p.RevId)
    from MyTable t
    inner join final p on p.id = t.id
    left join final n on p.rn = n.rn and n.overAvg = -1
    where p.overAvg = 1
    

    I strongly suggest you to test the script on different types of data distribution before you use it.

    I used multiple CTE expressions before update statement is formed. Also many SQL functions like Row_Number() with Partition By clause and SQL Aggregate functions like Count() with Partition By clause, etc.

    I hope it helps you for solution,

    Hi Gordon,

    I tested the above code block with different data sets, in some cases you need to execute the code twice. First execution minimizes the difference between counts of different categorized values, second will make them near to equal

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Friday, August 19, 2016 7:58 AM