locked
Need help for design Data Flow Task RRS feed

  • Question



  • Hi frns,

         I am new to SSIS. I need some help in designing the below dataflow task.


    -- Teacher creates several tasks and each task is assigned to multiple students
    -- The teacher table contains contains all the tasks created a every teacher
    use ods
    go
    create table teacher
    (
       yr int,
       tid int,
       tname varchar(20),
       taskid int
     
    )

    insert into teacher values(2007,101,'suraj','task1')
    insert into teacher values(2007,101,'suraj','task2')
    insert into teacher values(2007,102,'bharat','task3')

    insert into teacher values(2007,103,'paul','task4')
    insert into teacher values(2007,103,'paul','task5')
    insert into teacher values(2007,103,'paul','task6')


    -- Teacher "suraj" has created 2 tasks
    -- Teacher "bharat" has created 1 task

    select * from ods..teacher
    yr        tid tname    taskid
    ============================
    2007    101    suraj    1111
    2007    101    suraj    1122
    2007    102    bharat    2222

    -- Students table contains  studentid(sid),teacherid(i,e tid ) & taskid
    drop table students

    create table students
    (
      yr int,
      sid varchar(10),
      tid int,
      taskid varchar(10)
    )

    truncate table students

    insert into students values(2007,'stud1',101,'task1')
    insert into students values(2007,'stud1',101,'task2')

    insert into students values(2007,'stud2',101,'task1')
    insert into students values(2007,'stud2',101,'task2')

    --Note : stud1,stud2 comes under teacher with tid "101"



    insert into students values(2007,'stud3',102,'task3')

    -- Note : stud3 and stud4 comes under teacher with tid "102"

    insert into students values(2007,'stud4',103,'task4')
    insert into students values(2007,'stud4',103,'task5')
    insert into students values(2007,'stud4',103,'task6')

    insert into students values(2007,'stud5',103,'task4')

    select * from students
    yr        sid        tid    taskid
    ----------------------------
    2007    stud1    101    task1
    2007    stud1    101    task2

    2007    stud2    101    task1
    2007    stud2    101    task2

    2007    stud3    102    task3
    2007    stud4    103    task4
    2007    stud4    103    task5
    2007    stud4    103    task6
    2007    stud5    103    task4


    Now in my target table i need to load the data in a such a way that

    use targetdb
    go
    drop table trg
    go

    create table trg
    (
      yr int,   -- data should load from teacher.yr
      tid int, 
      taskid int(20),
      cnt  int
     
    )

    Mapping in target column and value to be loaded
    ==================================================
    yr   -- teacher.yr
    tid  -- teacher.id
    taskid -- this need to start a new sequence of numbers starting from 1 for each teacher and dont want the task id to be copied as it is.
    cntofstudents  -- need to count no of students from  "students" table for a given teacher and for his assignment

    For example for teacherid "101" and taskid "task1" there are 2 students
    again for the same teacher "101" and taskid "task2" there are 2 students


    For teacher "102" and taskid "task3"  there is only 1 student

    Similary for teacher "103"


    Relation
    ========

    Teacher table | Students Table
    yr              |  yr
    tid              |  tid


    After i run the ETL the data should look as follows :

    insert into trg values(2007,101,1,2)  
    insert into trg values(2007,101,2,2)

    insert into trg values(2007,102,1,1)

    insert into trg values(2007,103,1,2)    -- task4 is created by teacher "103" and assigned to 2 students stud4 and stud5
    insert into trg values(2007,103,2,1)    -- task5 is created by teacher "103" and assigned to 1 student i.e stud4
    insert into trg values(2007,103,3,1)    -- task6 is created by teacher "103" and assigned to 1 student i.e stud5

    Note : If u observer the values in 3rd column of the trg table, instead of directly mapping the taskid we need to generate a separate sequence for every teacher.

    BottomLine : for each and every task created by each teacher there should be a unique record along with the count of students in "STUDENTS" table


    Can anyone help me out in designing the Data Flow task for this Functionality.



    Thanks,
    Manu


    Wednesday, November 14, 2007 5:17 PM

Answers

All replies

  • Perhaps I'm missing something, but this looks to me like a job for a SELECT statement or view, with a GROUP BY and COUNT, not a job for SSIS.

     

    What am I missing here? Why are you looking at SSIS as a tool for basic aggregation?

    Wednesday, November 14, 2007 5:26 PM
  • I tend to agree with Matthew on this (using a SELECT with a group by is simpler), but if you really want to do it in the data flow, you'd use an Aggregate Transform to group on teacher and task, and count students.

     

    Wednesday, November 14, 2007 6:25 PM
  •  jwelch wrote:
    I tend to agree with Matthew on this (using a SELECT with a group by is simpler), but if you really want to do it in the data flow, you'd use an Aggregate Transform to group on teacher and task, and count students.

     

     

    But be aware that although you can do it this way, you'll be significantly increasing the complexity of your solution and significantly reducing performance if you have more than a handful of records.

     

    SSIS is a great tool, but unless I'm missing something here it is not the right tool for this job.

    Wednesday, November 14, 2007 6:51 PM
  • I might be misreading your suggestion, but it sounds like you are saying don't use SSIS at all for this.

     

    Since the OP is going between two databases, I'd still recommend the use of SSIS, and a dataflow. I'm not a big fan of linked servers, or even cross-database queries if there is a possiblity in the future that the databases might end up on seperate servers. Thus, I would suggest a data flow with an OLE DB Source with a SELECT..GROUP BY to get the aggregated counts, and an OLE DB Destination to write the results. Not a complex package at all.

     

    If I was misinterpreting you, my apologies. Chalk up it to healthy debate Smile

     

     

    Wednesday, November 14, 2007 7:08 PM


  • Thanks for the suggestions.

    One thing i want i know, if it is by using an SELECT statement can we reset the sequence number for a new teacher as per the target table data?

    Regards,
    Mahesh
    Wednesday, November 14, 2007 7:12 PM
  •  Manuv wrote:


    Thanks for the suggestions.

    One thing i want i know, if it is by using an SELECT statement can we reset the sequence number for a new teacher as per the target table data?

    Regards,
    Mahesh

     

    Smile Good point - I'd completely missed that. You can use the ROW_NUMBER() function in a SELECT statement to get that type of incrementing number. You could also use a script component in the data flow, but again, that adds to the complexity.

    Wednesday, November 14, 2007 7:20 PM
  • Yes, you can use the ranking functions (http://msdn2.microsoft.com/en-us/library/ms189798.aspx) in T-SQL to implement this functionality.

    Wednesday, November 14, 2007 7:21 PM

  • Thanks Mathew.

    Regards,
    Mahesh
    Friday, November 16, 2007 6:48 PM
  • There is also a Rank Transform available here that can be usefull for applying such numbering schemes inside SSIS itself - http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Rank-Transform.aspx

    Saturday, November 17, 2007 9:41 AM