locked
Invoking Stored Procedure for SQL Sink is paged every 10.000 Records - replace logic impossible RRS feed

  • Question

  • I followed the instructions to safely, reliable and concurrent load blob data into azure sql while also adding additional information (Slice) to the data. Therefore I use the SQLSink with a Procedure as described https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-sql-connector/#invoking-stored-procedure-for-sql-sink

    This example actually states for the procedure a DELETE and then INSERT. While I do exactly the same too, my Table Type Parameter should have millions of rows and is paged every 10000 records, so the first 10.000 get deleted when inserting the next 10.000 and so all my data is lost except the last rows.

    Is there a way to turn the paging of?

    Thanks for your help

    Hannes

    Just in case its required the RUNID:

    f5030f4b-832c-4155-9e4c-db6ef1ae1496_635241312000000000_635242176000000000_AzureSQLDatasetStageContact


    http://www.hmayer.net/

    Monday, September 14, 2015 4:51 PM

Answers

  • For the workaround, you could set the OUTPUT of the stored procedure activity like "AzureDB_SP" (the INPUT of the stored procedure activity is not necessary). Then you could set the INPUT of the copy activty like "ExternalBlob" and "AzureDB_SP", and OUTPUT like "AzureDB".

    • AzureDB and AzureDB_SP are two logical DataSets, but they target to the same physical Azure DB table.
    • Copy Activity can only have 1 INPUT, but the second or third INPUT can still be used as a dependency.
    • The inconvenient part of the workaround is that: when you want to retry, you need retry on the stored pocedure activity and copy activity respectively.

    BTW, for the above mentioned limitation of property "sqlWriterCleanupScript", we have checked a fix and the fix is waiting for our next Prod refresh. So you could also choose to wait few more days to directly leverage the property "sqlWriterCleanupScript".


    Monday, September 21, 2015 7:03 AM
  • Hi,

    Glad to let you know that we just did a prod refresh, and the fix of limitation of property "sqlWriterCleanupScript" is available in Prod. Please go ahead to try this feature, and let us know if you have other issues.

    Thanks,

    Yingqin

    • Marked as answer by hmayer1980 Friday, September 25, 2015 11:49 AM
    Wednesday, September 23, 2015 6:09 AM

All replies

  • Hi,

    Thanks for the question. In fact, the sample script in the link is not a proper one. I'm wondering why you need the DELETE and INSERT pattern in your stored procedure? If you want to keep reapetability during Copy, you could take advantage of the two mechanisms in the link. E.g. by using mechanism 1, you could just set the propety sqlWriterCleanupScript to achieve the DELETE purpose, and then SqlSink would do the default INSERT operation.

    For your information, a common case to invoke stored procedure is to avoid duplicate record or need to update certain records. In that case, you could write your stored procedure with MERGE operation.

    Tuesday, September 15, 2015 3:23 AM
  • Because the SQLSink cannot write the Slice Information to the data. I am loading a CSV file where the filename includes a Date (Slice), but this date is not included in the data itself. That is why I cannot write a script DELETE where <SliceDatecolumn>=Slice because I do not get the Date into the table! Which is a requirement. I also cannot use the Integrated Column for the Datafactory because the value (binary) is not the Slice Date.

    That is why the only Option to get the slice into the table is to use a stored procedure having an aditional Parameter with the slice Information. The Sample suggests it works, because it actually does sample exactly what I need. Just there is no-where a hint about the procedure beeing paged!

    And I cannot use Merge, because the Data does not have a unique key.

    I just now thought about using the Cleanup Script instead of deleting inside the stored procedure. But for some reason it does not get called? Why?

    The Sink Portion of my Datafactory Looks like

             

    "sink": {

    "type": "SqlSink",

    "sqlWriterCleanupScript": "$$Text.Format('exec Stage.uspDeletePromotion \\'{0:yyyy-MM-dd}\\'', SliceStart)",

    "SqlWriterTableType": "Stage.PromotionType",

    "SqlWriterStoredProcedureName": "Stage.uspLoadPromotion",

    "storedProcedureParameters": {

    "Slice": {

    "value": "$$Text.Format('{0:yyyy-MM-dd}', SliceStart)"

                  }

                },           

    "writeBatchSize": 0,

    "writeBatchTimeout": "00:00:00"

              }

    Why is the cleanup script not executed? Or I am doing something wrong?

    Other recommendations?

    Thanks

    Hannes


    http://www.hmayer.net/


    • Edited by hmayer1980 Tuesday, September 15, 2015 9:48 AM
    Tuesday, September 15, 2015 9:47 AM
  • Thanks for explaining your scenario. It is a valid case to use the "sqlWriterCleanupScript" and stored procedure. However, sorry to let you know that currently there is a limitation in ADF Copy. That is, the property "sqlWriterCleanupScript" can only work with the sqlSink with default Insert operation, and cannot work with stored procedure. We are aware of it now, and will fix it soon. Will further let you know when the fix is ready in Prod.

    Before the fix is available, a possible workaround is to use a Stored Procedure Activity before the Copy Activity: let the stored procedure activity run your cleanup script, and then let Copy Activity to invoke the stored procedure in sqlSink.

    For your information, the paging behavior is decided by the property "writeBatchSize". The rows of the specified size is treated as a batch/page to insert to target Azure DB. The reason is that Azure DB doesn't support a long transaction for large data size, and we had to split to multiple batches to make it work and for better resilience. Refer to this doc for the property "writeBatchSize", and we will also include it in the new doc link.

    • Marked as answer by hmayer1980 Wednesday, September 16, 2015 11:42 AM
    • Unmarked as answer by hmayer1980 Sunday, September 20, 2015 3:23 PM
    Wednesday, September 16, 2015 8:54 AM
  • From a database point of view I accepted your workaround, however I have not tried if yet.

    Now I cannot wrap my mind around it... How do I design the datasets for that workaround? What is input and output at which acticity?

    I have a CSV Dataset with External TRUE. I assume this needs to be the starting point and for example cannot be the Output dataset of my DELETE stored procedure activity? or can it?

    The Copy Activity can have only 1 (one) INPUT and 1(one) OUTPUT. (https://msdn.microsoft.com/en-us/library/azure/mt185732.aspx) . Can I have a second Input for the copy activity with the Output dataset of my DELETE procedure?

    How do I get the DELETE Stored Procedure Activity in the Workflow before the INSERT Procedure? I would prefere having the table always populated instead of empty (making the delete procedure the last one) in case I want to rerun a activity?

    Could you explain how to setup the workaround with regards to the datasets and activities?

    Thanks

    Hannes


    http://www.hmayer.net/

    Sunday, September 20, 2015 3:34 PM
  • For the workaround, you could set the OUTPUT of the stored procedure activity like "AzureDB_SP" (the INPUT of the stored procedure activity is not necessary). Then you could set the INPUT of the copy activty like "ExternalBlob" and "AzureDB_SP", and OUTPUT like "AzureDB".

    • AzureDB and AzureDB_SP are two logical DataSets, but they target to the same physical Azure DB table.
    • Copy Activity can only have 1 INPUT, but the second or third INPUT can still be used as a dependency.
    • The inconvenient part of the workaround is that: when you want to retry, you need retry on the stored pocedure activity and copy activity respectively.

    BTW, for the above mentioned limitation of property "sqlWriterCleanupScript", we have checked a fix and the fix is waiting for our next Prod refresh. So you could also choose to wait few more days to directly leverage the property "sqlWriterCleanupScript".


    Monday, September 21, 2015 7:03 AM
  • If it is only a few more days then I will wait for that.

    Thank you again.

    Hannes


    http://www.hmayer.net/

    Monday, September 21, 2015 2:38 PM
  • Hi,

    Glad to let you know that we just did a prod refresh, and the fix of limitation of property "sqlWriterCleanupScript" is available in Prod. Please go ahead to try this feature, and let us know if you have other issues.

    Thanks,

    Yingqin

    • Marked as answer by hmayer1980 Friday, September 25, 2015 11:49 AM
    Wednesday, September 23, 2015 6:09 AM
  • It is working.

    Thanks


    http://www.hmayer.net/


    • Edited by hmayer1980 Friday, September 25, 2015 11:50 AM
    Friday, September 25, 2015 11:50 AM