none
Query advice to avoid duplication RRS feed

  • Question



  • Runs
     
    Run status company
    1 C 10
    2 C 11
    3 P 10
    4 P 11


    Files
     
    empid run Category val
    123 1 A 100
    123 1 C 50
    456 2 A 43
    456 2 C 22
    123 3 A 23
    123 3 C 86
    456 4 A 64
    456 4 C 23
    234 1 A 63
    234 1 C 24
    543 2 A 99
    543 2 C 67
    234 3 A 11
    234 3 C 34
    543 4 A 56
    543 4 C 98


    Employees

     
    empid name company
    123 Tony 10
    456 Dave 11
    234 John 10
    543 Eric 11


    What is the best way to return all employees but along with the employee fields I also want to return two more columns.

    1. The first needs to contain the [val] from files where category = A but only for the latest complete run, deemed by status = C which is also got by matching on the company and id of the current employee.
    2. The second column then contains that of the first column + the same except now where category = C in the files.


    The result I'm after is:


    Empid name company Val_A Val_A + Val_C
    123 Tony 10 100 100 + 50 = 150
    456 Dave 11 43 43 + 22 = 65
    234 John 10 63 63 + 24 = 87
    543 Eric 11 99 99 + 67 = 166



    My attempt duplicated that of Val_A in both the columns, querying twice. I couldn't just reference the first and just addition the second column. Surely I'm adding extra overhead here.


    Also I didn't master unfortunately that of using the company code against an employee to do the lookup into the run table. I guess this would have a negative impact any way in that the query would happen for each and every employee. i.e. I'm guessing performance wise only querying once per unique company would be better to get the recent run and then to use that value over and over again.


    Any advice much appreciated. As I say I have something working which doesn't take much time but I'm trying to make it better and a little cleaner and if possible avoid querying the same thing twice which feels wrong.


    Thanks
    • Edited by Tezler Tuesday, January 25, 2011 10:14 PM formatting
    Tuesday, January 25, 2011 10:04 PM

Answers

  • Assuming that a status of 'C' means completed, and assuming that you want the maximum "run" that a particular employee was involved in, this query can do what you want, but I feel there's something missing in your "Files" table... like a date or some kind of column that designates which entry is the "latest" for an employee besides a "run number".

    declare @runs table (run int, stat char(1), company int)
    insert @runs select 1,'C',10
    union all  select 2,'C',11
    union all  select 3,'P',10
    union all  select 4,'P',11
    
    declare @files table (empid int, run int, category char(1), val int)
    insert @files select 123,1,'A',100
    union all   select 123,1,'C',50
    union all   select 456,2,'A',43
    union all   select 456,2,'C',22
    union all   select 123,3,'A',23
    union all   select 123,3,'C',86
    union all   select 456,4,'A',64
    union all   select 456,4,'C',23
    union all   select 234,1,'A',63
    union all   select 234,1,'C',24
    union all   select 543,2,'A',99
    union all   select 543,2,'C',67
    union all   select 234,3,'A',11
    union all   select 234,3,'C',34
    union all   select 543,4,'A',56
    union all   select 543,4,'C',98
    
    declare @emps table (empid int, empname varchar(10), company int)
    insert @emps select 123,'Tony',10
    union all  select 456,'Dave',11
    union all  select 234,'John',10
    union all  select 543,'Eric',11
    
    ;with CompletedFiles as
    (
     select f.empid
        ,f.run
        ,Val_A=sum(case when category='A' then val end)
        ,Val_C=sum(case when category='C' then val end)
        ,rnum=row_number() over (partition by f.empid order by f.run)
     from @files f
     join @runs r on f.run=r.run
     where r.stat='C'
     group by f.empid
         ,f.run
    )
    select e.empid
       ,e.empname
       ,e.company
       ,c.Val_A
       ,[Val_A+Val_C]=c.Val_A+c.Val_C
    from @emps e
    join CompletedFiles c on e.empid=c.empid
    where c.rnum=1
    /*
       empid empname    company    Val_A Val_A+Val_C
    ----------- ---------- ----------- ----------- -----------
        123 Tony        10     100     150
        456 Dave        11     43     65
        234 John        10     63     87
        543 Eric        11     99     166
    */
    
    

     


    --Brad (My Blog)
    • Marked as answer by Tezler Tuesday, May 17, 2011 11:09 AM
    Tuesday, January 25, 2011 11:19 PM
    Moderator

All replies

  • Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. If you know how, follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect. Your narrative is loaded with insanely vague data elements -- name, company, status, category -- and unknown constraints and data types. 

    Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. People hate tyoping INSERT INTO statements for posters. If you want to learn how to ask a question on a Newsgroup, look at: http://www.catb.org/~esr/faqs/smart-questions.html


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL
    Tuesday, January 25, 2011 11:18 PM
  • Assuming that a status of 'C' means completed, and assuming that you want the maximum "run" that a particular employee was involved in, this query can do what you want, but I feel there's something missing in your "Files" table... like a date or some kind of column that designates which entry is the "latest" for an employee besides a "run number".

    declare @runs table (run int, stat char(1), company int)
    insert @runs select 1,'C',10
    union all  select 2,'C',11
    union all  select 3,'P',10
    union all  select 4,'P',11
    
    declare @files table (empid int, run int, category char(1), val int)
    insert @files select 123,1,'A',100
    union all   select 123,1,'C',50
    union all   select 456,2,'A',43
    union all   select 456,2,'C',22
    union all   select 123,3,'A',23
    union all   select 123,3,'C',86
    union all   select 456,4,'A',64
    union all   select 456,4,'C',23
    union all   select 234,1,'A',63
    union all   select 234,1,'C',24
    union all   select 543,2,'A',99
    union all   select 543,2,'C',67
    union all   select 234,3,'A',11
    union all   select 234,3,'C',34
    union all   select 543,4,'A',56
    union all   select 543,4,'C',98
    
    declare @emps table (empid int, empname varchar(10), company int)
    insert @emps select 123,'Tony',10
    union all  select 456,'Dave',11
    union all  select 234,'John',10
    union all  select 543,'Eric',11
    
    ;with CompletedFiles as
    (
     select f.empid
        ,f.run
        ,Val_A=sum(case when category='A' then val end)
        ,Val_C=sum(case when category='C' then val end)
        ,rnum=row_number() over (partition by f.empid order by f.run)
     from @files f
     join @runs r on f.run=r.run
     where r.stat='C'
     group by f.empid
         ,f.run
    )
    select e.empid
       ,e.empname
       ,e.company
       ,c.Val_A
       ,[Val_A+Val_C]=c.Val_A+c.Val_C
    from @emps e
    join CompletedFiles c on e.empid=c.empid
    where c.rnum=1
    /*
       empid empname    company    Val_A Val_A+Val_C
    ----------- ---------- ----------- ----------- -----------
        123 Tony        10     100     150
        456 Dave        11     43     65
        234 John        10     63     87
        543 Eric        11     99     166
    */
    
    

     


    --Brad (My Blog)
    • Marked as answer by Tezler Tuesday, May 17, 2011 11:09 AM
    Tuesday, January 25, 2011 11:19 PM
    Moderator
  • First this is a proprietary database. I'm not in charge of any of the table structure and it can not be changed. Also I'm using sample data for many reasons no less than privacy. I also simplified what exactly it is I do want in the live scenario so instead of saying where just C, its also got 3 other clauses.

    Thanks Brad, this looks really interesting.

    There possibly is a date in the Files table but although we could get the latest file using it, we still need to only make sure we are getting the files for the latest completed run. In Progress runs will still have files and these dates will be more recent. Am I missing something?

    Once again, Thanks.

    Wednesday, January 26, 2011 9:26 AM
  • I think after reading your query I may face one issue, only because I didn't explain the final result fully.

    Leavers will also be included. However for a leaver, it's not to look up the last run the leaver was included in, instead it should still get the latest completed run for the company and then if the user is not in the run both value columns would be NULL.

    With the above in mind would you query differently? or would you just detect if the user is a leaver i.e. leaving_date <> NULL and then ouput null for both columns?

    Wednesday, January 26, 2011 9:41 AM