none
file system vs catalog RRS feed

  • Question

  • Hi we run 2017 standard and are at a crossroads where an older etl product will be replaced by ssis and we want to set strategic direction.

    what are we giving up if we opt to use the file system rather than the catalog as our package repository?



    • Edited by db042190 Thursday, August 15, 2019 2:32 PM new title
    Wednesday, August 14, 2019 7:19 PM

Answers

  • Hi db042190,

    The following links might be helpful.

    Hope it helps.

    Regards,

    Zoe


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    • Marked as answer by db042190 Thursday, August 15, 2019 5:21 PM
    Thursday, August 15, 2019 8:49 AM
  • 1) we believe there is a rule based on cores etc in a single package related to how many things can run in parallel.  we believe that applies to # subpackages called from a master as well.  we have 26 DF's (perhaps they would be subpackages) that need to run and we would like not to depend on a single package parallel limit for throughput but rather the limit of the ssis server.  We don't contend against any other department's jobs.  Is it better to have 26 subpackages or 26 packages to get as much throughput as possible? 

    2) what are we giving up if we opt to use the file system rather than the catalog as our package repository?

    3) If and when we combine DF's in the same package, its basically because we have 26 plants and are attempting to record from 26 sources their data for a single consolidated target table.  so could running things in parallel be causing as many problems as advantages?  And can we structure whatever we come up with not to bring down one plant's data flow just because another failed?

    WRT #1: Rather depends on whether the receiving system can sustain the parallelism. Then if the machine and database(s). Need to test very well

    #2: Not a good idea. I do not see how one could arrive to doing this.

    #3: You risk failing other plants if it brakes on any given one. I see it similar to #1.


    Arthur

    MyBlog


    Twitter

    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 1:47 PM
    Moderator
  • Hi db042190,

    Just ignore the Zoe's link.

    Starting with SSIS 2012 onwards it is a real server-side product which requires SQL Server DB instance installation. It has its SSISDB Catalog, SSISDB database, stored procedures, views, logging, reporting, management jobs, connections management, projects versioning, rollbacks, parameters, environments, etc.

    It is insane to suggest SSIS on the file system today.
    • Edited by Yitzhak Khabinsky Thursday, August 15, 2019 2:51 PM
    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 2:49 PM
  • thx Arthur. I'm going to focus on #2 here.

    Zoe provided a link to a comparison of file system vs catalog where file system beat the catalog in some areas.  But the link itself seems somewhat contradictory.

    Can you summarize why you wouldn't consider the file system?  All I really remember about the catalog is that it allows something (that you cant do in the file system) having to do with parameters, it recognizes projects if you want it to, it perhaps started moving in a direction that allowed folders, I think it started moving toward versioning and it had what seemed to me a weird intermediate file like medium/object that was used in promotion to the catalog. 

    it does not make any difference from what medium a package is loaded for execution.

    Starting SSIS 2012 SSISDB = Catalog is the repository for packages.


    Arthur

    MyBlog


    Twitter

    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 3:16 PM
    Moderator
  • Hi db042190,

    Microsoft keeps SSIS on the file system just for backward compatibility with SSIS 2005, 2008, and 2008R2.
    So the companies who used SSIS 2008R2 and earlier will have some transition period while migrating to the SSISDB Catalog. Unfortunately, this situation creates a confusion for the new installation like yours.

    You can find me on LinkedIn and Skype.
    • Edited by Yitzhak Khabinsky Thursday, August 15, 2019 3:23 PM
    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 3:22 PM
  • Hi db042190,

    I already provided a long list of benefits of the SSISDB Catalog and SSIS Project Deployment mode.

    "...problems..."  - none.

    Just recalled one issue with the SSISDB Catalog. Extensive logging can inflate its size. So there is an out-of-the-box scheduled job to clean up that logging. You might need to increase frequency of executions for that job.
    • Edited by Yitzhak Khabinsky Thursday, August 15, 2019 4:20 PM
    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 3:48 PM
  • Hi db042190,

    Again, you don't need to use two methods in parallel. Old way is kept just for backward compatibility for some transition period while migrating to the SSISDB Catalog.

    By the way, I added one area to be aware of in my previous reply.

    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 4:19 PM
  • Hi db042190,

    • *.ispac is just a zip compressed file. Rename any *.ispac file to *.zip, and you can access everything what is inside of it.
    • .Net CLR check box needs to be enabled during SSIS Catalog creation.
    • SSIS environments is a way to configure sets of parameters. It allows even to simulate dev,, test, and production environments on a single server.

    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 4:32 PM
  • Hi db042190,

    SSIS execution logging is going to the SSISDB database.

    There are multiple configurable levels of logging, starting with none up to verbose. There is also a custom logging. It allows to pick and choose events for logging.

    Because logs are in the database, it is possible to use SQL to query and analyze SSIS packages historic executions. Same with reporting on top of it.

    So logging is one of the SSISDB Catalog benefits. It is a server side configuration, separate from the development activities like it was earlier with the SSIS packages on the file system.


    • Edited by Yitzhak Khabinsky Thursday, August 15, 2019 4:55 PM
    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 4:52 PM

All replies

  • Hi db042190,

    The following links might be helpful.

    Hope it helps.

    Regards,

    Zoe


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    • Marked as answer by db042190 Thursday, August 15, 2019 5:21 PM
    Thursday, August 15, 2019 8:49 AM
  • thx , is the 1st link on the older side? Pls notice its matrix contradicts some of its prose like the statement "Backup and recovery is simpler with storing your packages in the msdb database".

    also , isn't there a whole thing on passing params and recognizing projects on the catalog side now?  And I thought I even heard once that the catalog side now recognizes versions of the same pkg.  I'm just trying to think high level right now.

    Thursday, August 15, 2019 11:51 AM
  • 1) we believe there is a rule based on cores etc in a single package related to how many things can run in parallel.  we believe that applies to # subpackages called from a master as well.  we have 26 DF's (perhaps they would be subpackages) that need to run and we would like not to depend on a single package parallel limit for throughput but rather the limit of the ssis server.  We don't contend against any other department's jobs.  Is it better to have 26 subpackages or 26 packages to get as much throughput as possible? 

    2) what are we giving up if we opt to use the file system rather than the catalog as our package repository?

    3) If and when we combine DF's in the same package, its basically because we have 26 plants and are attempting to record from 26 sources their data for a single consolidated target table.  so could running things in parallel be causing as many problems as advantages?  And can we structure whatever we come up with not to bring down one plant's data flow just because another failed?

    WRT #1: Rather depends on whether the receiving system can sustain the parallelism. Then if the machine and database(s). Need to test very well

    #2: Not a good idea. I do not see how one could arrive to doing this.

    #3: You risk failing other plants if it brakes on any given one. I see it similar to #1.


    Arthur

    MyBlog


    Twitter

    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 1:47 PM
    Moderator
  • thx Arthur. I'm going to focus on #2 here.

    Zoe provided a link to a comparison of file system vs catalog where file system beat the catalog in some areas.  But the link itself seems somewhat contradictory.

    Can you summarize why you wouldn't consider the file system?  All I really remember about the catalog is that it allows something (that you cant do in the file system) having to do with parameters, it recognizes projects if you want it to, it perhaps started moving in a direction that allowed folders, I think it started moving toward versioning and it had what seemed to me a weird intermediate file like medium/object that was used in promotion to the catalog. 

    Thursday, August 15, 2019 2:31 PM
  • Hi db042190,

    Just ignore the Zoe's link.

    Starting with SSIS 2012 onwards it is a real server-side product which requires SQL Server DB instance installation. It has its SSISDB Catalog, SSISDB database, stored procedures, views, logging, reporting, management jobs, connections management, projects versioning, rollbacks, parameters, environments, etc.

    It is insane to suggest SSIS on the file system today.
    • Edited by Yitzhak Khabinsky Thursday, August 15, 2019 2:51 PM
    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 2:49 PM
  • thx Yitzhak, just to be clear I mean for storing and executing packages so i'm not sure why procs come into play.  Apparently I've fallen behind on this subject.

    What are the top 3-5 make or break reasons that it would be insane? I know of more than one installation where they are beyond 2012 and still using the file system to store and execute pkgs.  My recollection is that using the catalog introduces complications to the use of ssis.

    Thursday, August 15, 2019 3:08 PM
  • thx Arthur. I'm going to focus on #2 here.

    Zoe provided a link to a comparison of file system vs catalog where file system beat the catalog in some areas.  But the link itself seems somewhat contradictory.

    Can you summarize why you wouldn't consider the file system?  All I really remember about the catalog is that it allows something (that you cant do in the file system) having to do with parameters, it recognizes projects if you want it to, it perhaps started moving in a direction that allowed folders, I think it started moving toward versioning and it had what seemed to me a weird intermediate file like medium/object that was used in promotion to the catalog. 

    it does not make any difference from what medium a package is loaded for execution.

    Starting SSIS 2012 SSISDB = Catalog is the repository for packages.


    Arthur

    MyBlog


    Twitter

    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 3:16 PM
    Moderator
  • Hi db042190,

    Microsoft keeps SSIS on the file system just for backward compatibility with SSIS 2005, 2008, and 2008R2.
    So the companies who used SSIS 2008R2 and earlier will have some transition period while migrating to the SSISDB Catalog. Unfortunately, this situation creates a confusion for the new installation like yours.

    You can find me on LinkedIn and Skype.
    • Edited by Yitzhak Khabinsky Thursday, August 15, 2019 3:23 PM
    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 3:22 PM
  • thx Arthur and Yitzhak but it seems we still have a choice.

    Can one of you give me some compelling reasons for going with the catalog?  And 1 or 2 problems we might encounter (that we wouldn't otherwise) if we go with the catalog?

    Thursday, August 15, 2019 3:37 PM
  • Hi db042190,

    I already provided a long list of benefits of the SSISDB Catalog and SSIS Project Deployment mode.

    "...problems..."  - none.

    Just recalled one issue with the SSISDB Catalog. Extensive logging can inflate its size. So there is an out-of-the-box scheduled job to clean up that logging. You might need to increase frequency of executions for that job.
    • Edited by Yitzhak Khabinsky Thursday, August 15, 2019 4:20 PM
    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 3:48 PM
  • got it! So is there no such thing as the various config choices we used to have (eg sql table based, xml, environ vars etc) when using the project deployment model?  Can I (or would I ever) switch to the package deployment model and still use the catalog?   would I ever have a good reason for using the catalog and also the old config features.  
    Thursday, August 15, 2019 4:06 PM
  • Hi db042190,

    Again, you don't need to use two methods in parallel. Old way is kept just for backward compatibility for some transition period while migrating to the SSISDB Catalog.

    By the way, I added one area to be aware of in my previous reply.

    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 4:19 PM
  • ispac is the weird thing I was referring to.  looks like maybe its metadata about the individual pkgs in the project.

    according to this (I think) https://mindmajix.com/ssis/deployment-models, clr has to be enabled.   and i'm not sure yet what the author means by   1) new environments (what are new environments?) in the ssis catalog can be used with parameters and 2) robust logging (during execution? during promotion?) is built in.... 

    Thursday, August 15, 2019 4:24 PM
  • thx Yitzhak, regarding the logging, is that logging that occurs during execution of pkgs?

    we used to log to the file system with more verbose stuff than what you could log to a database table.

    could this catalog logging be directed there (file system) instead of the catalog?

    Thursday, August 15, 2019 4:28 PM
  • Hi db042190,

    • *.ispac is just a zip compressed file. Rename any *.ispac file to *.zip, and you can access everything what is inside of it.
    • .Net CLR check box needs to be enabled during SSIS Catalog creation.
    • SSIS environments is a way to configure sets of parameters. It allows even to simulate dev,, test, and production environments on a single server.

    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 4:32 PM
  • thx is the logging execution logging? does it need to go to the catalog?
    Thursday, August 15, 2019 4:36 PM
  • Hi db042190,

    SSIS execution logging is going to the SSISDB database.

    There are multiple configurable levels of logging, starting with none up to verbose. There is also a custom logging. It allows to pick and choose events for logging.

    Because logs are in the database, it is possible to use SQL to query and analyze SSIS packages historic executions. Same with reporting on top of it.

    So logging is one of the SSISDB Catalog benefits. It is a server side configuration, separate from the development activities like it was earlier with the SSIS packages on the file system.


    • Edited by Yitzhak Khabinsky Thursday, August 15, 2019 4:55 PM
    • Marked as answer by db042190 Thursday, August 15, 2019 5:20 PM
    Thursday, August 15, 2019 4:52 PM
  • hopefully people wont confuse  "versioning" with source control. we just learned its not a substitute for source control.  would have been nice if it was as two different "data of records"  doesn't feel right to me. 
    Monday, September 9, 2019 12:00 PM
  • Hi db042190,

    Back to the same conceptual separation of duties:

    • Development environment on a dev. machine is always working with a source control.
    • Run-time environment on a server allows versioning (snapshots), i.e. a quick rollback/switch to the previously deployed version(s) to protect against the buggy release/deployment.
    Monday, September 9, 2019 1:24 PM