none
Data Flow Task Interaction Confusion RRS feed

  • Question


  • Hi everyone,
    I have a question related to data flow task's block. Lets say i have a problem to solve that consist of 5 major transformation  steps and each transformation step ends with a  Result set, this result set of Step1 is required by Step2 and the Result set of step2 is required by 3 ....

    now i can solve the the problem inside a single "data flow task" block but this is not my aim, since this is not clean aproach specially if the calculation and transformation requires a lof of task so since i would like to solve the problem in differnt block depending on the complexity and logic of the problem.

    My Question is how can I pass Result set of a "data flow task" block from step a to the "data flow task" Block of Step B in the Controll flow view(tab) , i tried connecting both blocks as in the dataflow view(tab) by my connections does not show any metadata informations which mean the do not contains any afaik.



    thanks in Advance
    e. Yassine
    Thursday, October 11, 2007 11:37 AM

Answers

  • There is really no reason for you to seperate out the data flow "Blocks".  However, if you would still like to do this, you should use raw files to pass information back and forth (this is going to be the quickest way as it is in native format and this is exactly the task for which this format was designed).

     

    Thursday, October 11, 2007 12:07 PM
    Moderator
  • You could write a custom transform to execute your block, although it won't be using the stock transforms. 

     

    About the only ways to get a reusable piece using the stock tasks that I can think of (and I am sure that there are more) would be to put the portion of each block into it's own package and have a master package to call between each.  You could write the results to a temp table, or, you can use parent package variables to get the result set to the child and use the following article to pass the recordset up to the parent (http://vsteamsystemcentral.com/cs21/blogs/steve_fibich/archive/2007/09/29/passing-a-value-back-and-forth-from-a-parent-package-to-a-child-package-in-ssis.aspx).  Of course, reading through a recordset is no easy task (usually including a foreachloop or script task, which is not cost effective compared to the data flow task.

    Thursday, October 11, 2007 1:04 PM
    Moderator
  • Reusing sections of a pipeline was a planned feature for 2008, but it didn't make the cut for the release, last I heard. Eric's suggestions are the best workarounds available right now.

     

    Friday, October 12, 2007 3:09 AM
    Moderator

All replies

  • There is really no reason for you to seperate out the data flow "Blocks".  However, if you would still like to do this, you should use raw files to pass information back and forth (this is going to be the quickest way as it is in native format and this is exactly the task for which this format was designed).

     

    Thursday, October 11, 2007 12:07 PM
    Moderator
  • first thanks for your replay,

    My main goal is separation of concerns and calrity in Data flow which is not possible now  since i have everything inside one Data Flow block ( at lease visually)
    Im looking to find out if there are any possible technic/way/ Block to package a set of operations(used in differnt places) in one Block that can be resued without the need to IO/to file 
    I'am looking for some kind of sub flow element which contains a minimal set of transformations etc..


    any ideas?

    e yassine
    Thursday, October 11, 2007 12:42 PM
  • You could write a custom transform to execute your block, although it won't be using the stock transforms. 

     

    About the only ways to get a reusable piece using the stock tasks that I can think of (and I am sure that there are more) would be to put the portion of each block into it's own package and have a master package to call between each.  You could write the results to a temp table, or, you can use parent package variables to get the result set to the child and use the following article to pass the recordset up to the parent (http://vsteamsystemcentral.com/cs21/blogs/steve_fibich/archive/2007/09/29/passing-a-value-back-and-forth-from-a-parent-package-to-a-child-package-in-ssis.aspx).  Of course, reading through a recordset is no easy task (usually including a foreachloop or script task, which is not cost effective compared to the data flow task.

    Thursday, October 11, 2007 1:04 PM
    Moderator
  • Reusing sections of a pipeline was a planned feature for 2008, but it didn't make the cut for the release, last I heard. Eric's suggestions are the best workarounds available right now.

     

    Friday, October 12, 2007 3:09 AM
    Moderator
  • Thanks Eric,jwelch,

    Well, the fun beginns right now since ,my data flow (pipelines) are getting bigger and me and my colleges are starting to face the problem that it is no more maintain able there is no way to extend it or to edit it since each change requires us to recorrect all the "sub blocks" and that's a real pain.

    1) im wondering how you guys manage these conflicts, working as a team on the same solution? and managing fast growing data flow of pipelines?

    2) Eric thanks for the URL to
    vsteamsystemcentral.com, i dont know if i get it right but here is my understanding from that article so you can correct me if i get thing wrong:
    Step1 Parent Package:
    * create a Parent package
    * create variables in the Parent package
    * drop an "execute package block" into the _Parent_ Package, that execute the Child package

    Step2 Child Package:
    * create a local varibale in the _Child_ package and set its type to Object varname: TestedObject
    * grab a set fo data out of the database  ( in the case of the example: [SELECT TOP 10 name FROM SYS.TABLES]
    * stores this result set in the above defined varibale TestedObject
    * send the result set to an "execute Script block"
    the target Script block have two properties:
    ReadOnlyVariables and ReadWriteVariables the local variable is defined as a ReadOnly (i dont know why, do you have any idea?) and the Varibale from the _Parent_ packge is set to ReadWrite ( i assume the result of the script will be stored here or at least a reference to its location, is this correct?)

    * The script operate on the set of names grabed by the query.
    Step 3 Parent package:
    * executing the parent package.

    im afraid my question is still unresolved since i still can not figure out how i can:


    3) can i read the contents of a Parent varibales from different  block than the "Script block" of the child package? ( for example from a "merge join", "conditional split" etc ...)

    4) can read the content of a varibales in differnt data flows inside the same package?

    5) can i store a result set in a varibale as an Object and reconvert it to its original forme in an other data flow or package?



    Thank you Guys for the value able information
    cheers
    eyassine
    Saturday, October 13, 2007 10:48 AM
  •  eyassine wrote:
    Thanks Eric,jwelch,

    1) im wondering how you guys manage these conflicts, working as a team on the same solution? and managing fast growing data flow of pipelines?

     

    Break the packages (and data flows) down into smaller chunks. Use raw files or staging tables to persist data between packages. That way individual team members can check the packages out and work individually.

     

    One way to think of this is to look at each data flow as a method that you are calling with a set of data. The method outputs another set of data, based on the input. But inside the method (or dataflow) is a black box. You can't access or manipulate anything inside the dataflow, except from within the dataflow. Anything you want to get in or out has to be part of the data, or passed in like a parameter (through package variables). If you have a big process that you are writing in a traditional programming language, you are probably breaking the logic up into multiple routines. Look at SSIS processes the same way.

     

    3) You can use a Parent Package Variable configuration to read a variable value from a calling package, without having to use script.

     

    4) Assuming that the variable is scoped correctly (at a high enough level to be seen by the appropriate data flows), then yes, you can access it in multiple flows. Be aware, though, that you may not be able to read a ADODB.Recordset multiple times, as SSIS creates these as forward-only cursors.

     

    5) Yes you can.

     

    Saturday, October 13, 2007 3:01 PM
    Moderator
  •  eyassine wrote:


    2) Eric thanks for the URL to
    vsteamsystemcentral.com, i dont know if i get it right but here is my understanding from that article so you can correct me if i get thing wrong:
    Step1 Parent Package:
    * create a Parent package
    * create variables in the Parent package
    * drop an "execute package block" into the _Parent_ Package, that execute the Child package

    Step2 Child Package:
    * create a local varibale in the _Child_ package and set its type to Object varname: TestedObject
    * grab a set fo data out of the database  ( in the case of the example: [SELECT TOP 10 name FROM SYS.TABLES]
    * stores this result set in the above defined varibale TestedObject
    * send the result set to an "execute Script block"
    the target Script block have two properties:
    ReadOnlyVariables and ReadWriteVariables the local variable is defined as a ReadOnly (i dont know why, do you have any idea?) and the Varibale from the _Parent_ packge is set to ReadWrite ( i assume the result of the script will be stored here or at least a reference to its location, is this correct?)

    * The script operate on the set of names grabed by the query.
    Step 3 Parent package:
    * executing the parent package.


    im afraid my question is still unresolved since i still can not figure out how i can:

     

    You have it nearly correct.  The important part to understand from this article was that it IS possible to send information from a child package back to the parent package.  As the author suggests, the best place to do this will probably be in an on-post execute event handler (after all processing by the child is done).  Most of the example was set up simply to show you that it is possible to pass information back to the parent.  Remember to set up the parent package variable (so that you can receive the information from the parent package), to use the execute package task - in the parent package - to call the child package, and that you will execute a script task to set the parent package variable to equate to whatever recordset it is you would like to have passed back to the parent from the child package. 

     

    NOTES:

    I believe (although I could be wrong) that the author placed the local variable into a readonly both to show that you aren't changing this variable in the script and possibly to avoid some of the situations relating to locking.

     

    The "result" of the script is to copy the contents of _Tested Object (child variable)_ to _Test Object (parent variable)_. 

     

    Also, There is no real "Step 3" as the control or package execution simply returns to the parent package after executing the child package.

    Monday, October 15, 2007 12:50 PM
    Moderator
  • I agree with Eric, the point of the passing data sets back from a child to a parent in the link above was mainly used to pass small record sets dealing with metadata.  This is so a generic parent package could handle all of the logging and error handling allowing the creation of many child packages to work on the seperate data objects but not requiring error handling and logging routines to be duplicated in each child package.  (Example: a system in which you have to import customer, item, vendor information where you have a number of meta data operations including counting records and logging package processes that you handle at the parent level but requires the child to pass back a number of row counts for each imported "version" of a customer object.)  You could use it to pass back a completed dataflow but I think I would rather accomplish that task by using a raw file and or a staging table.

     

    If you have a particular quesion on how to execute passing an object variable (or any variable) from a child package back to a parent package please post it and I'll see what I can do.  (post it either here or on http://vsteamsystemcentral.com/cs21/blogs/steve_fibich/archive/2007/09/29/passing-a-value-back-and-forth-from-a-parent-package-to-a-child-package-in-ssis.aspx) I hope that helps.

    Monday, October 22, 2007 12:38 PM