locked
Secure Deletion of Data RRS feed

  • Question

  • I hope this is the right forum to post this.

    We have been asked whether it is possible to completely delete sensitive data. When data is deleted it is not actually completely removed from the database but is marked as deleted. It is possible to zero out that data using sp_clean_db_free_space but this doesn't affect the transaction log.

    We had hoped that CHECKPOINT would clear the data from the log but we are not completely certain of this.

    So the question becomes is there a built-in command or function in SQL Server 2008, or subsequent versions, that will completely remove deleted data, both from the transaction log and the database itself? Or is there a 3rd party tool to accomplish the same thing?

    TIA

    Michael MacGregor


    Michael MacGregor, Senior SQL Server DBA

    • Moved by SathyanarrayananS Tuesday, November 17, 2015 1:59 PM More appropriate forum
    Monday, November 16, 2015 9:15 PM

Answers

  • The Common Criteria flag implements "Residual Information Protection", which refers to "overwriting memory" and is up to interpretation.  However I believe it also implements overwriting the data in the file when rows are deleted, so they are not recoverable.

    Thursday, November 19, 2015 5:16 PM

All replies

  • What is the recovery mode for the database? There is a reason why data is logged. CHECKPOINT will flush dirty pages to transaction log sequentially and then the data files will be updated. Until the CHECKPOINT is seen by LazyWriter, the log will be active for the previous checkpoint. Eventually the virtual logs will be freed.

    https://technet.microsoft.com/en-us/library/ms189085%28v=sql.105%29.aspx?f=255&MSPPError=-2147217396

    Monday, November 16, 2015 9:26 PM
  • To be honest, it doesn't really matter whether the log is in simple or full recovery mode, the question remains one of being able to know for certain that deleted data is removed from the log. The data is highly sensitive so when deleted, it must be absolutely guaranteed that no remnant of that data exists, whether in the log or in the data files. CHECKPOINT writes dirty pages in the cache to disk, but does that mean the data is no longer in the log or simply that it has now been written to disk.


    Michael MacGregor, Senior SQL Server DBA

    Monday, November 16, 2015 9:37 PM
  • Once checkpoint is issued, after all the dirty pages are flushed to disk, log can be truncated until the recent checkpoint. To make sure the data is getting removed from transaction log for sure you may need to do a backup log with truncate_only option and shrink the log file so that the virtual logs will be available for reuse.

    But weigh the options of performance if you want to go this route. 

    Monday, November 16, 2015 9:48 PM
  • I think you have to be careful when you say "the dirty pages are flushed to disk" as this is not how it's worded in BOL where it says that a checkpoint will "write these dirty pages to disk" which is very different from being flushed which would imply that the data concerned has been removed from the log.

    Even doing a truncate does not mean that the data has been removed only that the inactive portion of the log has been marked for reuse.

    What we are looking for is to guarantee that no vestige of that deleted data exists in the log or in the data files. Using sp_clean_db_free_space would seem to address that requirement but only for the data files not for the transaction log. From everything we have been able to determine about CHECKPOINT and log truncation, this does not guarantee that the data is no longer in the log, i.e. that it has been completely removed.


    Michael MacGregor, Senior SQL Server DBA

    Monday, November 16, 2015 9:54 PM
  • In full recovery model when you run a delete command and it succeeds and then transaction log VLF that were part of the transaction were reutalized and overwritten with other transaction information in that case you can safely consider no body can read sensitive data from transaction log.

    So if you run delete operation in full recovery after that perform multiple log backups and run some other transaction just to make sure the VLF's ae utilized.

    There is also a way to dump content of log to Nul

    backup log db_name to disk='Nul'


    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My Wiki Articles

    MVP

    Tuesday, November 17, 2015 1:39 PM
  • Hi,

    There is something called Ghostcleanuptask

    More on Ghost records


    sathya - www.allaboutmssql.com ** Mark as answered if my post solved your problem and Vote as helpful if my post was useful **.

    Tuesday, November 17, 2015 1:55 PM
  • Michael

    Where your delete is committed  the virtual log file that contains the data will be marking for reuse (status 2 will be replaced with status 0) you can issue a command suggested by Shanky ...OR CHECKPOINT

    CREATE TABLE t1 (c INT NOT NULL IDENTITY(1,1))

    INSERT INTO t1 DEFAULT VALUES
    GO 100

    SELECT * FROM T1

    ALTER DATABASE B SET RECOVERY SIMPLE

    GO

    BACKUP DATABASE B TO DISK ='NUL'

    DELETE FROM t1

    ----CHECKPOINT   

    ----After checkpoint there is no records

    DECLARE @TableName sysname
    SET @TableName = 'dbo.t1'

    SELECT
        u.[name] AS UserName
        , l.[Begin Time] AS TransactionStartTime
    ,[Transaction Name]
    FROM
        fn_dblog(NULL, NULL) l
    INNER JOIN
        (
        SELECT
            [Transaction ID]
        FROM 
            fn_dblog(NULL, NULL) 
        WHERE
            AllocUnitName LIKE @TableName + '%'
        AND
            Operation = 'LOP_DELETE_ROWS'
        ) deletes
    ON  deletes.[Transaction ID] = l.[Transaction ID]
    INNER JOIN
        sysusers u
    ON  u.[sid] = l.[Transaction SID]

    ---Also examine a default trace file that actually will be overwritten as well.
    DECLARE @path NVARCHAR(260);

    SELECT @path = REVERSE(SUBSTRING(REVERSE([path]),
       CHARINDEX('\', REVERSE([path])), 260)) + N'log.trc'
    FROM sys.traces WHERE is_default = 1;

    SELECT * FROM sys.fn_trace_gettable(@path, DEFAULT) WHERE EventClass = 47 ORDER BY StartTime DESC;


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Tuesday, November 17, 2015 3:46 PM
    Answerer
  • I'm pretty certain that you won't be able to achieve what you want, I'm afraid.

    Take the t-log, debated heavily in this thread, for instance. "truncting the log" will only flip a bit in the beginning of the VLF to mark it as "OK to overwrite". I.e., the data isn't physically overwritten. I believe that there are log reader out there that can retrieve info from the log which "isn't there anymore". How those tools does this is pretty obvious. :-)

    I did some dinning into this some years ago and I got a negative response to my (short) investigation. Bottom line: too many ifs and buts. In the end, we'd need a setting in SQL Server that would guarantee this. The common Criteria "button" for instance has RIP for memory - but that is only if something were to snoop at allocated memory.


    Tibor Karaszi, SQL Server MVP | web | blog

    Tuesday, November 17, 2015 7:05 PM
  • I cannot confirm it right now, but I believe the new "Common Criteria" option and the previous "C2 audit options" would overwrite the data in the page when the data was deleted for government compliance.

    Please see:

    https://msdn.microsoft.com/en-us/library/bb326650(v=sql.120).aspx

    Tuesday, November 17, 2015 7:14 PM
  • I cannot confirm it right now, but I believe the new "Common Criteria" option and the previous "C2 audit options" would overwrite the data in the page when the data was deleted for government compliance.

    Please see:

    https://msdn.microsoft.com/en-us/library/bb326650(v=sql.120).aspx

    I remember a lot of fuss about this years ago, but I haven't been involved with it for a long time.

    From a quick look at various places, it does NOT look to me like C2 security guarantees to overwrite disk storage after deletion (or movement, etc), and probably not the log - at least not until the log is truncated.

    It looks to me like OP here wants security much tighter than C2, which is mostly about isolation and auditing.

    https://en.wikipedia.org/wiki/Trusted_Computer_System_Evaluation_Criteria

    https://msdn.microsoft.com/en-us/library/windows/desktop/aa376387(v=vs.85).aspx

    I think that modern approaches have switched in recent years to using encryption instead of physical deletion.

    Josh

    Tuesday, November 17, 2015 7:51 PM
  • Personally, I wouldn't trust anything else than an explicit statement in the product. Something like "This button makes sure that no residues persists beyond ... (next log trucation, for instance) after deletion." Anything less than that and we're chasing shadows (right analogy?). And, since there is no such statement, I agree with Josh, that carefully selected and implemented encryption is probably the way to go. SQL Server 2016 will make encryption easier using "Always Encrypted" functionality.

    Tibor Karaszi, SQL Server MVP | web | blog

    Tuesday, November 17, 2015 8:22 PM
  • That's the conclusion that I came to after doing a lot of research and discussing the issue with Tom Moreau.


    Michael MacGregor, Senior SQL Server DBA

    Tuesday, November 17, 2015 8:30 PM
  • As for encryption it contains one tricky bit:
    "The pages in an encrypted database are encrypted before they are written to disk and decrypted when read into memory"

    ~ https://msdn.microsoft.com/en-us/library/bb934049.aspx

    sp_clean_db_free_space
    This has an issue with a drop-column in combination with indexes so tred with care.

    As for 3th-party tooling;
    i can not recall any but as it would involve doing unsafe operations in the mem-space of SQL-Server and the file-structure (which i dont think microsoft would adore) i doubt its possible.

    My Two Pennies:
    As mitigation option you can use a nightly job to shrink the LDFS, clean the freed space from the LDFs by processing the underlying disks with SDelete https://technet.microsoft.com/en-us/sysinternals/bb897443.aspx followed by something heavy as say index-maintence.

    But be aware of ghost-records in indexes and tempdb that might hold parts of the data.

    Tuesday, November 17, 2015 9:01 PM
  • I wasn't referring to TDE, when I mentioned encryption since TDE will definitely not cut it. Either do the encryption in the client app, and have the keys etc in the client. I.e., SQL Server will never see the un-encrypted data. Or use Always Encrypted in 2016, which basically help you implement this method in a bit less DIY way.

    Tibor Karaszi, SQL Server MVP | web | blog

    Tuesday, November 17, 2015 9:14 PM
  • I'm pretty certain that you won't be able to achieve what you want, I'm afraid.

    Take the t-log, debated heavily in this thread, for instance. "truncting the log" will only flip a bit in the beginning of the VLF to mark it as "OK to overwrite". I.e., the data isn't physically overwritten.

    Tibor,

    Out of curiosity if data is not Overwritten what actually happens and how the LVF is reutalized. I was in opinion that when VLF uis reutalized and written with new information the old one is gone.


    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My Wiki Articles

    MVP

    Wednesday, November 18, 2015 8:32 AM
  • Shanky,

    We need to differentiate between log truncation (log backup, for instance) and re-using those VLFs. Log truncation marks the VLF as "OK to overwrite". It doesn't physically delete the data in the VLF. Like deletion of a file - it is marked as deleted in the file allocation table (or similar) but the data isn't physically overwritten.

    At some later stage, the VLF will be re-used and at that point in time, the data in there will of course be overwritten and no longer re-retrievable.

    I imagine that there is a header in the beginning of each VLF and one of the bit in there states that the VLF is OK to overwrite. Think of the status column in the output from DBCC LOGINFO. I don't have the internals of the ldf or VLFs, so I don't have further details.


    Tibor Karaszi, SQL Server MVP | web | blog

    Wednesday, November 18, 2015 7:47 PM
  • At some later stage, the VLF will be re-used and at that point in time, the data in there will of course be overwritten and no longer re-retrievable.

    Thanks Tibor for your comment , in above comment you mentioned that after being overwritten the data will no longer be re-retrievable I was actually pointing to this thing. Now if this happens will any tool stil be able to retrive previous data  ? I guess not, but my knowledge is limited in this aspect

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My Wiki Articles

    MVP

    Thursday, November 19, 2015 7:31 AM
  • I figured that you were indeed referring to the physical overwriting of data, my long elaboration was more in the spirit to benefit other readers here. :-)

    I doubt that any tool would be able to retrieve data *from the t-log* as soon as it is overwritten. IMO, management of this would be a nightmare, though. Imagine considering all VLFs, whether they are overwritten yet and also how much of this are in log. Including aspects such as long running transactions etc.

    And that is just the t-log. Where else might the data be? Ghost records? Indexes? Deallocated pages not physically overwritten? I don't know. Bottom line, IMO, and as I stated before is that unless there is a knob in the product to guarantee this, then we shouldn't expect this since we never will have certainty anyhow. :-)


    Tibor Karaszi, SQL Server MVP | web | blog

    Thursday, November 19, 2015 7:55 AM
  • Hi Tibor

    What is the OP right after deletion STOP sql server service , delete log file and attach the db with single file option.. , what do you think?

    PS. This is generally very bad to suggest but specifically for this special case


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thursday, November 19, 2015 8:00 AM
    Answerer
  • delete the records, truncate the log, free the system cache and rebuild the ldf file and you should be good, I believe.
    Thursday, November 19, 2015 10:46 AM
  • what about all your backups (I presume you have them)?  Short of pumping all your data into a new DB, or using dbcc shrinkfile emptyfile (possibly)....

    Thanks, Andrew
    My blog...

    Thursday, November 19, 2015 2:16 PM
  • Theoretically I believe it is possible to have all traces of the data gone from the mdf and ldf. For that one has to make sure all data & index pages that once have been used have been reused/overwritten by new objects. That one can force by combinations of shrinking and rebuilding objects. Problem is you never know if you just missed one little page.

    Next thing would be to make sure also system table have no references left over on their pages - if object names matter as well.

    The transaction log I am less worried: a couple of rollovers should do.

    But of course don't forget the backups - also of the log. And how do you erase the space on windows level? There several tools exist who make the promise to securely erase space used.

    If that small chance that some pages will be missed is still too risky, I suggest to construct a new database and get rid of the old files altogether. And next time use encryption from start on (TDE can help, as it never leaves unencrypted data)


    Andreas Wolter (Blog | Twitter)
    MCSM: Microsoft Certified Solutions Master Data Platform, MCM, MVP
    www.SarpedonQualityLab.com | www.SQL-Server-Master-Class.com

    Thursday, November 19, 2015 3:15 PM
  • The Common Criteria flag implements "Residual Information Protection", which refers to "overwriting memory" and is up to interpretation.  However I believe it also implements overwriting the data in the file when rows are deleted, so they are not recoverable.

    Thursday, November 19, 2015 5:16 PM
  • Uri,

    I never ever ever recommend deleting a log file. Since you can't guarantee that the database will be cleanly shit down, you might end up with an un-usable database. Also, since we can't guarantee the rest of the residues, I fail to see the usefulness...

    As I stated before, I do not recommend that one chase ways to remove all residues until you get a knob on the product to *guarantee* this. But suggestions keep popping up here so I believe I'm pretty alone in this viewpoint. At the very least, I thing one should communicate with whoever asked for this that "I did my best, but I can't guarantee anything".

    One thing I mentioned earlier was data on old pages. Are we supposed to shrink the database until we *believe* that all pages has been overwritten? How would we even know that this has really happened?


    Tibor Karaszi, SQL Server MVP | web | blog

    Thursday, November 19, 2015 5:26 PM
  • That's the conclusion that I came to after doing a lot of research and discussing the issue with Tom Moreau.

    If you want to get really paranoid, you have to realize that even if you have overwritten the pages on the disk they may *still* be recoverable, if someone steals your physical disk and is sufficiently motivated.

    Josh

    Thursday, November 19, 2015 6:48 PM