none
Order in foreach RRS feed

  • Question

  • I've made some pipelines;

    • Fill staging tables
    • Execute all stored procedures
    • Refresh Power BI datasets (in progress)
    • Master pipeline which first executes the fill staging tables and then the execute all stored procedures (which contain some ETL logic) and after that refreshes the Power BI datasets.

    This all works great. But now I'm trying to influence the order of execution from SP's. I understand that I can split dimensions and facts or maybe go a little bit further and make some groups. But this is not exactly what I'm looking for. I want to create a process that generates a flow for execution with order in it. So when a specific facttable needs only 1 special staging table and 2 filled dimensions it can start right when the 2 dim's and 1 staging table are ready, regardless of that there are maybe some dimensions being filled right now.

    I'm not sure about how to do this. I understand that I can execute a stored procedure which will write a record in a statustable or something. And I've also made a stored procedure which you can give 2 parameters; a stored procedure and his schema and it will output the needed input tables. And based on the name you can determine what it is filling.

    So SP [dim].[sp-product] fills dim.product and when I execute my dependencies stored procedure:

    EXEC	@return_value = [config].[sp-check_dependencies]		@name = N'sp-salesspecial',		@schema = N'fct'

    It outputs me this:

    referenced_schema_name referenced_entity_name
    dim                                         product
    fct                                         sales
    staging                                 table6

    Of course I am able to loop through all my SP's with this SP and write this in a table or whatever... But even when I have a table like this and a status table with succeeded executes I still don't see how to control the dependencies.

    So I would like to have something like this:


    And then let things go parallel but with the correct order. So that fct.salesspecial can be fillled already while dim.department is still being filled because of that this is a really large table (by example).

    So to be short; I want to use ADF as orchestrator for some copying and executing SP's with ETL but with things parallel and respecting the correct order of execution...

    Friday, December 6, 2019 8:49 PM

All replies

  • Hi Mike,

    If you have multiple activities that you want to execute in sequence, you can chain activities so that they execute in sequence.

    If you want to run each iteration of a ForEach loop in sequence or in parallel, you can do so by setting the isSequential attribute.

    Hope this helps.

    Monday, December 9, 2019 7:55 AM
    Owner
  • Hi @ChiragMishra-MSFT,

    First of all; thank you very much for your reply! I'm really stuck on this issue but I want to solve this issue for sure.

    The first URL you provide gives me the ability to generate JSON which fixes the dependencies. This is a possibilty but not what I'm looking for right now (maybe I can use this as a last solution).

    The second URL tells me more about the foreach loop, but I don't think I see anything there that enables me to fix what I want? Right? 

    Tuesday, December 10, 2019 6:24 PM
  • Hi @ChiragMishra-MSFT,

    First of all; thank you very much for your reply! I'm really stuck on this issue but I want to solve this issue for sure.

    The first URL you provide gives me the ability to generate JSON which fixes the dependencies. This is a possibilty but not what I'm looking for right now (maybe I can use this as a last solution).

    The second URL tells me more about the foreach loop, but I don't think I see anything there that enables me to fix what I want? Right? 

    Hi Mike,

    We have a feature in preview that may help you achieve what you are looking for.

    Basically, what it does is that it triggers a pipeline based on a dimension for another pipeline. Imagine it like a TumblingWindowTrigger but instead of using time as dimension, it's any dimension inputted by the user.

    It has a few caveats:

    • The dependencies have to be pipelines (so you won't be able to use only one Staging pipeline passing different parameters, you will need N staging pipelines)
    • If you want to do this recurrently, the dimension needs to be different every time and you would need a way to pass the same dimension for a full dependency flow. For example, if staging A runs with dimensions "01/01/01-Product" and "01/01/01-Sales", then the Product pipeline will need Staging2 to run with Dimension "01/01/01-Product" to be able to run, and ftc.Sales will need Staging3 and dim.Product to run with dimension "01/01/01-Sales" and so on. For the next recurrent run, the dimension should be different for it to run. If you run again Staging1 with "01/01/01-Product", dim.Product won't recognize it has to run again.

    It's still under development, but if you want to give it a try let me know and we can enable it for your subscription.

    Other than using this feature, it would be really hard for ADF to achieve the dependency that you need
    Wednesday, December 11, 2019 12:13 AM
  • Yes, I would LOVE to have this feature. To be honoust I don't get what you say for 100%, but I would love to play with it. I'm not able to send PM to you so how can I get in touch with you to provide my subscription details?

    Wednesday, December 11, 2019 8:24 AM
  • Hi @ChiragMishra-MSFT,

    First of all; thank you very much for your reply! I'm really stuck on this issue but I want to solve this issue for sure.

    The first URL you provide gives me the ability to generate JSON which fixes the dependencies. This is a possibilty but not what I'm looking for right now (maybe I can use this as a last solution).

    The second URL tells me more about the foreach loop, but I don't think I see anything there that enables me to fix what I want? Right? 

    Hi Mike,

    We have a feature in preview that may help you achieve what you are looking for.

    Basically, what it does is that it triggers a pipeline based on a dimension for another pipeline. Imagine it like a TumblingWindowTrigger but instead of using time as dimension, it's any dimension inputted by the user.

    It has a few caveats:

    • The dependencies have to be pipelines (so you won't be able to use only one Staging pipeline passing different parameters, you will need N staging pipelines)
    • If you want to do this recurrently, the dimension needs to be different every time and you would need a way to pass the same dimension for a full dependency flow. For example, if staging A runs with dimensions "01/01/01-Product" and "01/01/01-Sales", then the Product pipeline will need Staging2 to run with Dimension "01/01/01-Product" to be able to run, and ftc.Sales will need Staging3 and dim.Product to run with dimension "01/01/01-Sales" and so on. For the next recurrent run, the dimension should be different for it to run. If you run again Staging1 with "01/01/01-Product", dim.Product won't recognize it has to run again.

    It's still under development, but if you want to give it a try let me know and we can enable it for your subscription.

    Other than using this feature, it would be really hard for ADF to achieve the dependency that you need

    Can you grant me access pleassssse?

    I'm still looking for a solution. So if anyone knows a thing to try for me....

    Saturday, January 4, 2020 7:31 AM