none
Robust pre/post deploy scripts

    Question

  • Hi guys. I am finding the pre and post deploy scripts complex and error prone within the current system and I'd like to see how this might be improved and enhancements to the application to help. With the SSDT taking a lot of the responsibility for ensuring safe deployment - there seem to be weaknesses in the current design for real-world scenarios.

    So, lets say you have a major table change on dbo.TableA that requires manual data movement between dbo.TableA v1.0 and dbo.TableA in v1.1. I have a discussion on how to manage the execution of specific version scripts on a different thread (http://social.msdn.microsoft.com/Forums/en-US/ssdt/thread/14afd65b-699d-410c-ad88-e62b4aea65a2), so lets concentrate on the data movement process.

    However important to note that my mechanism uses extended property to indicate version and so as this is updated by the database model itself, this will only execute pre if version is the version going from (in Pre v1.1 this would be v1.0) and only execute post if version is going to (in Post v1.1 this would be v1.1)

    So, the only way I have so far been able to move data from old to new version tables is:.

    In "Pre v1.1" script, copy data from dbo.TableA into temp.[TableA-v1.0], then delete rows from dbo.TableA

    In "Post v1.1" script, copy data into dbo.TableA from temp.[TableA-v1.0].

    However this has some issues:

    1. temp.[TableA-v1.0] doesn't exist in the database (and possibly even the temp schema) but as it is a persisted real table it should exist in the model. So we add it to the model but this will not exist in the database at the time that Pre v1.1 will run.

      If you add the create to the Pre v1.1 script the body of the deploy will fail as it will try to create the table there when it already exists.

      My only solution to this is to have a "Before Publish v1.1 Script" which *must* be run manually before the publish starts. This is clearly not ideal and I'm unsure what the better solution is.

      For the moment it is important for any pre-run dependencies to have checks at the top of the Pre vX scripts to ensure that the tables do exist before doing any copying / deleting of data.
    2. If the body of the deploy fails for any reason, the version increment may or may not have incremented (depending on how far through and whether the transactional script option has been specified).

      This could mean that the Pre script can be run again and depending on how the copy data script is written (if truncating data first for example), this could clear the temp.[TableA-v1.0] and replace it with the contents of dbo.[TableA] (which will be empty)

      My solution to this is to have a reentrant copy data procedure that will not delete data from the target table if there is nothing in the source table.


    Issue 1 actually exists in any case where you want to use an object in the database in a pre-deploy script. This could be a function/sp/table or whatever and as pre-deploy scripts get more complex in order to ensure their resilience (reusing modular code as would probably be recommended) I think this is an increasingly important issue.

    For example, my solution to issue 2 is an SP that takes table from and table to and writes some dynamic sql to check rowcounts, return sensible print statements and perform the operation.

    Issue 2 highlights a wider issue surrounding the re-entrant requirements of the pre and post deployment scripts. Transactions could be an answer depending on how the version script management is working, but I believe as currently implemented the pre / body / post scripts have to be independantly transactionally controlled. I am not sure whether the publish script will support putting transaction start / commit in the pre/post deploy scripts and if so what the recommended way of doing this would be ... e.g. I guess you couldn't use try/catch as there would be multiple batches as part of the deploy

    So even if there is a data movement mechanism introduced into the system (I see some tantalising build types in the CTPs), these pre/post deployment issues still exist and I haven't found any clear guidance on this as yet.

    Deployment is clearly a key facet of this technology and I think some clear guidance would be useful, from MS if possible - especially as this technology is about to be unleashed onto a much wider audience as it will be packaged with the server itself.

    Thanks for reading!


    Monday, January 30, 2012 12:54 PM

Answers

  • In an attempt to answer my own question (in the ocean of indifference about this :) ) I am considering changing my current method to a snapshot based system.

    - A single set of scripts that deal with static data, configuration and validation tests that are always deployed regardless of version. (probably GlobalPreDeploy-PreVersion, GlobalPreDeploy-PostVersion, GlobalPostDeploy-PreVersion, GlobalPostDeploy-PostVersion)

    - Pre/Post deploy scripts get changed on each version to new vX.XX PreDeploy vX.XX PostDeploy scripts that all follow the same template and call the respective GlobalXXXDeploy script at the top and the bottom of the version specific scripts.

    - Once a version is decided on and has been tested as working - produce snapshot and version accordingly.

    - This has the advantage over my previous method of only including the code for the version of the snapshot and not all the historic versions, muddying the waters somewhat and producing syntax errors later on when objects change name requiring further wrapping. Also advantage that pre and post deploy scripts always happen regardless of the version indicator in the database itself (ext property or a persisted version table) - these are still useful but are really labels rather than forming part of the deploy logic.

    - Disadvantages are that if you are going from a db through multiple versions that you need to deploy each one in order - minor issue - solved by an extension to my existing powershell deploy tool.

    But overall I think this is a better method and probably what the new snapshot functionality was designed for (I was hoping someone from MS would already have suggested this however if this was the case..)

    Issues still to solve.

    a. Pre-Publish script automation - for where objects are required in the pre-deploy script in the database that is being published. how can I package up this script in the dacpac that it relates to - then unpack and execute against the target database before starting the publish operation?

    b. Noisy static data management (lots of code for the data). Really want a grid-based set of static database but having overridable values for different configurations (dev/uat/live etc). Sql2012 ssis may help this but unsure if can distill the development process down to simply providing the correct fields, plus a few more for which configs to include and tell it the table to merge to.

    Any other issues on my proposed methods in real world scenarios or comments on how to improve this would be welcome.

    Tuesday, May 08, 2012 1:41 PM

All replies

  • It would be great to get some real-work guidance on how this should work from MS, or at least a comment to say that there will be an article that covers it sometime soon.

    I appreciate the MS guys are busy working on getting the software ready for release but I think this is a really important part of real-life database development using this tool and this is still an issue with robust database development even with the new tools.

    Wednesday, February 29, 2012 12:02 PM
  • I agree that how to handle 'Data Motion' (that's the terminology they seem to be coining in Entity Framework Migrations documentation), is a feature that seems to be seriously lacking in the SSDT tooling.

    I've spent 18 months working on a rails project with Active Record migrations where you handle schema and data migrations consistently and have been looking for something similar for a new .NET MVC project.

    I've looked at EF Migrations.  However, we may not use EF as a data access technology.  I also think that SSDT would be a better fit for the current culture/skills of the project.  But if we have to write our own infrastructure to handle data migrations, I'm not sure what SSDT really gives us.

    Sunday, March 25, 2012 8:41 PM
  • I agree that how to handle 'Data Motion' (that's the terminology they seem to be coining in Entity Framework Migrations documentation), is a feature that seems to be seriously lacking in the SSDT tooling. ...  But if we have to write our own infrastructure to handle data migrations, I'm not sure what SSDT really gives us.


    To the point. Without data compare and script generation tools to help with "Data Motion" - managing only Schema is almost useless.
    Monday, March 26, 2012 8:19 PM
  • Interesting comments guys. I definately have high regard for the rich set of features in SSDT and I don't agree that data is a  make or break feature. There are tools within sqlserver to allow a high-flexibility kind of data motion, you don't have any tools built into the system to manage schema and programmatic changes across all sql database objects, source control systems, refactor code, trace dependencies etc etc. 

    I think the existing ORM tools (afaik) lack the full compliment of sql object types which SSDT supports well so I am not sure that they are the answer except for those systems that are not taking advantage of all the features e.g. service broker, triggers tsql & clr, extended properties are things that can complement enterprise type databases not to mention synonym / views / UDFs etc to manage change and optimise aspects of the system. Let alone the permissions / role definition and making that a first class member of the database design.

    Orm tools are fine for highly narrow databases to service an application but where systems start to behave more like an enterprise database, then I don't believe they are applicable or comparable.

    I do however think that the pre / post deploy aspect is the next thing that needs some supporting tools or at least best practice guidance and this was really what I was hoping to get from this thread. Perhaps MS haven't given this sufficient thought yet but hopefully the comments here will help raise the priority of it as it is the "next hardest thing" in the real world - now that the first hardest thing has been fixed mostly with SSDT / Database Projects.

    Wednesday, March 28, 2012 4:10 PM
  • In an attempt to answer my own question (in the ocean of indifference about this :) ) I am considering changing my current method to a snapshot based system.

    - A single set of scripts that deal with static data, configuration and validation tests that are always deployed regardless of version. (probably GlobalPreDeploy-PreVersion, GlobalPreDeploy-PostVersion, GlobalPostDeploy-PreVersion, GlobalPostDeploy-PostVersion)

    - Pre/Post deploy scripts get changed on each version to new vX.XX PreDeploy vX.XX PostDeploy scripts that all follow the same template and call the respective GlobalXXXDeploy script at the top and the bottom of the version specific scripts.

    - Once a version is decided on and has been tested as working - produce snapshot and version accordingly.

    - This has the advantage over my previous method of only including the code for the version of the snapshot and not all the historic versions, muddying the waters somewhat and producing syntax errors later on when objects change name requiring further wrapping. Also advantage that pre and post deploy scripts always happen regardless of the version indicator in the database itself (ext property or a persisted version table) - these are still useful but are really labels rather than forming part of the deploy logic.

    - Disadvantages are that if you are going from a db through multiple versions that you need to deploy each one in order - minor issue - solved by an extension to my existing powershell deploy tool.

    But overall I think this is a better method and probably what the new snapshot functionality was designed for (I was hoping someone from MS would already have suggested this however if this was the case..)

    Issues still to solve.

    a. Pre-Publish script automation - for where objects are required in the pre-deploy script in the database that is being published. how can I package up this script in the dacpac that it relates to - then unpack and execute against the target database before starting the publish operation?

    b. Noisy static data management (lots of code for the data). Really want a grid-based set of static database but having overridable values for different configurations (dev/uat/live etc). Sql2012 ssis may help this but unsure if can distill the development process down to simply providing the correct fields, plus a few more for which configs to include and tell it the table to merge to.

    Any other issues on my proposed methods in real world scenarios or comments on how to improve this would be welcome.

    Tuesday, May 08, 2012 1:41 PM
  • Thanks for posting your thoughts on this Brett.  Our project is raising similar questions at the moment - we have a website with quite a complex database that is gradually becoming more complex with every monthly release.

    For a long time we've been handling schema & data changes by carefully authoring scripts that are deployed in a specific sequence every release.  Extremely flexible, but quite labour intensive.

    On a different project, we've just stared using SSDT for SQL2012/VS2010 and we LOVE the way it integrates with source control and allows developers to easily keep their local database environment in sync with source control.

    We are still scratching our head as to whether we can leave behind our traditional approach of carefully crafting incremental update scripts to advance the schema and massage the data at the same time.  For simple additive processes (e.g. add a new column, add a new table) I'm quite happy that SSDT will do this out of the box.

    For more complex changes (e.g. take two related tables, split them apart, introduce a new table between them, do the necessary inserts and re-map all the keys) I'm still looking for a method to handle this reliably and in as automated a fashion as possible.

    Our current scripting process allows us to model the lifespan of a database as a series of self-contained instructions (scripts) that consist of both data and schema changes.  They are executed in a complete sequence.  With this approach, whenever a script is executed, the database is in a state that is known at the time the script is designed.

    As far as I can tell, with SSDT, we get one shot of custom logic before schema changes (pre-deploy script) and one shot of custom logic after schema changes (post-deploy script), and the database could be in any state when these are executed.  What I'm struggling with is, how can we string together multiple self-contained data-and/or-schema changes whereby the only the necessary changes are applied, based on the version of the database being deployed to.

    Note we are still on SQL2008 in production and not heading to Azure any time soon so I'm not sure the DACPAC stuff is relevant for us right now. The snapshot mechanism that Brett refers to looks interesting and may well be part of our solution.

    I would be very interested in some samples, info on new features or a blog post if the Data Tools team has some ideas on how this could be managed with the current or future versions.

    SSDT is great from what I've seen so far - but it seems to me that if we need to add our own metadata to manage the version number of our database, and hand-code schema/data update scripts, the less of an advantage we get from all the great schema compare and "point the tool at a database and upgrade the schema to match source control" stuff.  If we have to change the way we think about database changes (e.g. never make destructive changes, only ever add new new objects and live with the probable redundancy) then I'm open to this if we can be satisfied that other teams have managed this OK.

    Cheers,

    Michael

    Monday, November 12, 2012 12:31 AM
  • I've been trying to figure out how to solve this as well.

    My current brainstorming is to snapshot each 'release' and then after doing any development generate a script by using SQL compare from the current version of the project against the snapshot.

    Use that script and then insert any data movement scripts that I need in line and save it to an Upgrades\001_Upgrade.sql 

    Deployment then is a matter of executing any upgrade scripts that are newer than the current database version.

    I am literally forming this idea as I type, so maybe I'm missing something huge. There's definitely more manual steps than I would like, but I figure if we can come up with a good repeatable manual process we should be able to automate that (or have good guidelines for what we want Microsoft to do)

    Wednesday, November 14, 2012 3:12 PM
  • My process on this has progressed somewhat since my last post and too much to cover here in any detail (I will add to my new blog when I get some time). I am now also considering source code branching requirements and significant reuse wherever possible

    Essentially I have "Development" and "Versions".

    A Development is a requirement or set of requirements that can be released in isolation. It can (and often is) branched off in source control.

    A version is a release package that will contain 1 or more developments

    Each Development has it's own set of Pre and Post deploy scripts.

    The Version then reuses each of the development pre and post deploy scripts for each of the Developments that have been tested and signed off - the changes are merged back into the trunk (or wherever).

    I have a standard pattern of Pre and Post scripts including a Initialise, Core and Finalise (so 6 scripts). These are Shared scripts (S)

    I have one set of 6 for Global activities (that are always called by every version). These are doing all the Static Data script work, locking database before and releasing after, post deploy validation checks etc etc.

    I then have up to the 6 for each individual deployment. These do anything that is required to make that specific release work.

    For the development and version release you then have a Master (M) Pre and Post scripts (2 for each) - . These are arranged to call the Initialise, Core and Finalise scripts for both the development and shared scripts and look something like this:

    go -- Isolate this script in a batch so can be sure variables and temp tables dont clash
    --===============================================================================================
    raiserror('%s Starting : v1.001 M Pre-Deploy.sql %s', 0, 0, 
    '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>', '>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>') with nowait
    --===============================================================================================
    
    
    --==================== Global Initialise
    
    :r "..\..\..\Deploy - Project\Scripts - Deploy\Orch - Global\Global S Pre-Deploy Initialise.sql"
    
    --==================== Development(s) Initialise
    
    --:r "..\..\..\Dev - Items\RC-0857 - My Feature\Deploy\RC-0857 S Pre-Deploy Initialise.sql"
    
    --==================== Global Core
    
    :r "..\..\..\Deploy - Project\Scripts - Deploy\Orch - Global\Global S Pre-Deploy Core.sql"
    
    --==================== Development(s) Core
    
    --:r "..\..\..\Dev - Items\RC-0857 - My Feature\Deploy\RC-0857 S Pre-Deploy Core.sql"
    
    --==================== Development(s) Finalise
    
    --:r "..\..\..\Dev - Items\RC-0857 - My Feature\Deploy\RC-0857 S Pre-Deploy Finalise.sql"
    
    --==================== Global Finalise
    :r "..\..\..\Deploy - Project\Scripts - Deploy\Orch - Global\Global S Pre-Deploy Finalise.sql"
    
    
    --===============================================================================================
    raiserror('%s Finished : v1.001 M Pre-Deploy.sql %s', 0, 0, 
    '<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<', '<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<') with nowait
    --===============================================================================================

    Development scripts obviously only call their own S scripts, Version scripts use all the S scripts from each of the Deployments that are being included as part of the development.

    If you are working on a develoment you set the Developments M scripts to be the pre/post for the project. If working on a version release you set the Version's M scripts to be the pre/post for the project. You then snapshot after doing this which locks the pre/post stuff up.

    I then release using a powershell helper that runs SqlPackage.exe and takes the snapshot (or the default build for the project).

    You need to know the version you are releasing and the version of the current database and ensure that they are compatible. Currently this is a manual process but not too difficult. I simply have a extended property Version as a label to help me know which database is which.

    This way you have a good foundation within which to build any pre/post scripts which are ultimately needed to work around some of the limitations of SSDT (of which there will always be some)

    1. Have the ability for multiple members to work at once and branch as needed and merge back safely

    2. Have a well defined way to sequence different parts of pre and post scripts as necessary whatever your requirements.

    For example if you want to set restricted_user always when releasing a project, you can add it to the shared initialise script and revert in the shared finalise - if just for specific developments that require it you can then do it only in each developments initialise and finalise. (see my warning on using single-user mode here http://social.msdn.microsoft.com/Forums/en-US/ssdt/thread/3ec684b8-4e3a-49e5-ae28-4c7655febfd7)

    Interested to know if this helps anyone.


    Tuesday, November 20, 2012 3:23 PM
  • Brett - this looks like a comprehensive approach, albeit probably too sophisticated for our project.  We are 2-3 person teams doing all work in trunk and thankfully managing to avoid merging branches.

    Just another note on this issue, I have stumbled across a couple of blog entries by Barclay Hill on a recommended way to deal with Data Motion when using SSDT (or VSTS: DB or Juneau or whatever it was called in 2009):

    http://blogs.msdn.com/b/bahill/archive/2009/03/30/managing-data-motion-during-your-deployments-part-1.aspx

    Looks pretty good.

    Wednesday, December 12, 2012 10:33 PM