none
Custom Data Lifecycle Solution

    Question

  • Dear Colleagues,

    I thought I might share my plans with some SharePoint gurus here as 2 heads are always better than one... :)

    I am looking at possibilities of SharePoint 2013 in terms of a solution for automated archiving, retention and disposal of files using multi-tiered storage. I also want to integrate the 2nd tier (visible archives) in SharePoint Search so users can specify the vertical, query and retrieve archived files themselves. Our main limitation here is that we own SQL Server 2012 Standard edition which does not allow us to take advantage of a native RBS in a practical way. We have analysed some 3rd party tools from AvePoint, Metalogix, Varonis etc. and even some of them work under SQL Standard- but they cost quite a lot of money- almost the same figure we would pay for upgrading to SQL Enterprise (uuu, that's a lot).

    The whole situation made me look at the problem from a different angle. i know there may be convincing voices as to why we should upgrade to SQL Enterprise. This is not really what I am after in this post please.  

    Out of curiosity, I looked into possibilities of Business Connectivity Services, External Content Types, Powershell and SharePoint workflows as alternative routes to achieve my goals.

    So far, I realised the following and tested just the first phase of the solution:

    1. We can scan each list/library/site/sitecollection for items according to our archival rules and move such filtered content out of SQL into 2nd Tier (File Share directories, each for Site Coll., Site, List and Content Types) on a conditional basis. E.g: execute the script with a Workflow (any item older than 2yrs? as trigger)

    2. We can crawl those structures of File Share on the 2nd tier, create appropriate Result Sources, Vertical (Archives), maybe even refiners.

    3. Each directory on Tier 2 would have another Windows Powershell script assigned to move files out of Tier 2 to the lower level. That would be based on how long for we want files to be searchable and retrievable from within SharePoint.

    4. Similarly, on Tier 3 there will be another set of scripts that will fire after the retention period and the files will be destroyed.

    Every step would generate a report file that will index every file that has been moved from any location. Ideally, I just want to update that very same report file on each level by adding to it lines of names of files that have moved downwards so there is always an up to date copy of the report.

    I hope to take advantage of some automation and there is definitely some of it here. Obviously, when users would create new SharePoint objects: Sites, Lists etc. ,we would need to apply our archival, visibility and retention rules again. That would be the case anyway, even if we used some fancy £20,000+ tool form a 3rd party. With 3rd PArty, we would just have a nice GUI for it I guess. 

    What do you guys think about such a solution for a real-world company? It saves us a significant amount of money but is it really something worth pursuing? I am not sure if I can propose such a solution the the board heh.

    Thanks and I hope to see some interesting thoughts fetching...

    Regards

    Friday, February 7, 2014 3:15 PM

Answers

  • about your first question: when you copy the file from Sharepoint by default metadata is only preserved for MS Office documents, because it is stored not only in content database, but also inside files (see Remove Sharepoint metadata from MS Office documents). However it won't be preserved for pdfs and other file formats automatically. I.e. if you need it, your script should also export all metadata of the file before to remove it from content database (e.g. to xml or txt file with the same name as original document: foo.pdf and foo_metadata.xml). But it will also mean that you will loose ability to search documents in file system by metadata.

    About second question: as far as I understand we are talking here about on-premis Sharepoint installation (not office365). In this case workflows may work with file system, but you should keep in mind the following: workflow may run in w3wp (iis) or in owstimer (Sharepoint timer service), depending on whether it is synchronous or not. If you have all in one server, then you may work with local file system (just need to ensure that app pool account and account used for owstimer have enough permissions for the appropriate folder), but if you have a farm with several servers, then you need to setup network share and also ensure that app pool and owstimer accounts have access to it.


    Blog - http://sadomovalex.blogspot.com
    Dynamic CAML queries via C# - http://camlex.codeplex.com

    Tuesday, February 11, 2014 1:05 PM

All replies

  • My first thought is to make the technology fit the process, not the other way around.  Once your paper process is clear, SharePoint 2013 should be more than capable of producing something for your needs.

    Can you explain what each tier corresponds to?  I'm possibly missing a context to each one.


    Steven Andrews
    SharePoint Business Analyst: LiveNation Entertainment
    Blog: baron72.wordpress.com
    Twitter: Follow @backpackerd00d
    My Wiki Articles: CodePlex Corner Series
    Please remember to mark your question as "answered" if this solves (or helps) your problem.

    Friday, February 7, 2014 4:41 PM
    Answerer
  • Hi Steven! Thanks for your reply. Here is the idea of my tiered storage solution using SharePoint and Windows Server.

    Tier 1- (active, highly co-authored and versioned content in SQL databases. Mostly Office documents and PDFs utilising SharePoint 2013 shredded storage).

    Tier 2- (An area of File Server for near-active or inactive content. Tier 2 would be an appropriate folder structure corresponding to SharePoint site collections, sites, libraries/lists and content types. Folders will be the Result Sources in SharePoint Search under the 'Archives' vertical. Users will be able to search and retrieve archives themselves for however long we configure such content to be accessible via SharePoint.)

    Tier 3 - (Another area, [possibly an exact replica of the Tier 2] of File Server [or other repository e.g. cloud] where archives will be retained for however long we need to keep them according to our retention tables. From this level the files will be disposed of.)

    My aim is to produce as much automation as possible. I guess I would have to learn how to trigger Powershell scripts with SharePoint 2010 workflow. Also, I need to maintain documents metadata and permissions hence my initial idea of using folder structure in Tier 2 and 3. I think, once you move documents out of SharePoint into file server, most of the meaningful metadata is gone anyway.

    • Edited by Lukas_UK Monday, February 10, 2014 1:00 PM
    Monday, February 10, 2014 12:59 PM
  • hi

    Sharepoint is flexible enough, so you may adopt its OTB features for your purposes, or, if limitations of standard features will be crucial, implement custom solution. It is better always to start with analyzing of OTB features. In this case I would probably start with the following link: Plan document management in SharePoint 2013 and see how it will match your archiving and retention requirements. The first thought that it should be possible to fit your requirements into Sharepoint.


    Blog - http://sadomovalex.blogspot.com
    Dynamic CAML queries via C# - http://camlex.codeplex.com

    Monday, February 10, 2014 2:47 PM
  • Hi sadomovalex! thanks for your reply...

    The document management processes and procedures are already defined in our company. We use sharepoint with properly structured metadata, versioning and some information management. We are a step ahead of all that.

    Now we move on to externalising content form SQL databases (cost, performance, manageability) and take advantage of tiered storage. This thread focuses on just that. In other words: how to use sharepoint native functionality to achieve a proper automation in data lifecycle management in a most cost-efficient way.. So the clear distinction here is between what we can achieve using Powershell, windows server scheduling, windows file server and SharePoint workflows versus 3rd party data governance solutions (Metalogix, DocAve, Varonis etc.) 

    Monday, February 10, 2014 3:08 PM
  • from what you have posted the option with SQL Server Enterprise license with configuring RBS looks preferable. With PowerShell and custom development it is possible to make almost everything in Sharepoint, but I feel that in your case costs of this approach (development, configuration and maintenance) will be more than purchasing of Enterprise license.

    Blog - http://sadomovalex.blogspot.com
    Dynamic CAML queries via C# - http://camlex.codeplex.com

    Tuesday, February 11, 2014 10:52 AM
  • Hi sadomovalex! Thanks again...

    It is interesting what you are saying. On the other hand though, I feel pretty capable of configuring our solution using a combination of Powershell ans SP Workflows. I just want to bounce my idea off other SP developers so that they can say how would they approach the task or perhaps suggest the "cleanest" ways of achieving my goals.

    So obviously, I am starting with powershell and a standard script to externalise content from SQL databases into File Server. The script will be modified for each archival rule so different libraries will have different rules (mainly based on time and metadata). That bit works but I have to integrate it in a SP workflow so the scripts will run when they need to as opposed to every day. Scripts will be creating folder hierarchy on the File Server corresponding to archival rules.

    The SharePoint Search scopes, vertical should also work with directories on Files Server. I do not have a problem with that. The only concern here is that I might lose metadata and permissions on this level. So my first question is how to maintain metadata and permissions to content externalised from SQL to File Server?

    The second question I seem to struggle with at the moment is how I would perform inventory operations on a File Server on a scheduled basis? It seems like my only filtering options I can use with Windows Powershell would be the directories the content was downloaded to from the above tier. I do not think SP workflows can touch File Server directories can they?

    Any suggestions are highly appreciated! Thanks

    Tuesday, February 11, 2014 11:15 AM
  • about your first question: when you copy the file from Sharepoint by default metadata is only preserved for MS Office documents, because it is stored not only in content database, but also inside files (see Remove Sharepoint metadata from MS Office documents). However it won't be preserved for pdfs and other file formats automatically. I.e. if you need it, your script should also export all metadata of the file before to remove it from content database (e.g. to xml or txt file with the same name as original document: foo.pdf and foo_metadata.xml). But it will also mean that you will loose ability to search documents in file system by metadata.

    About second question: as far as I understand we are talking here about on-premis Sharepoint installation (not office365). In this case workflows may work with file system, but you should keep in mind the following: workflow may run in w3wp (iis) or in owstimer (Sharepoint timer service), depending on whether it is synchronous or not. If you have all in one server, then you may work with local file system (just need to ensure that app pool account and account used for owstimer have enough permissions for the appropriate folder), but if you have a farm with several servers, then you need to setup network share and also ensure that app pool and owstimer accounts have access to it.


    Blog - http://sadomovalex.blogspot.com
    Dynamic CAML queries via C# - http://camlex.codeplex.com

    Tuesday, February 11, 2014 1:05 PM
  • Very useful comments, thank you sadomovalex!

    I will crack on with my development and see the result. Will post some updates here.

    Thanks again!

    Tuesday, February 11, 2014 1:24 PM