locked
mdf and ldf files RRS feed

  • Question

  • I would like to check my understanding after reading about this topic. Please corrrct me on any of the following if I'm wrong or misguided. Am I thinking about these the correct way?

    (1) SQL Server database consists of the data portion (mdf) and the log portion (ldf).

    (2) The data portion is stuff that is committed and written to the disk. The log portion is a "journal" of all the things that have happened to the database since the last time the stuff in the "journal" was written to the disk. (assuming the database is in full recovery mode)

    (3) There are certain commands that will trigger the stuff in the log to get written to the data portion. For example, 'backup log <db> to disk = <backup file>' will flush the content of the ldf to mdf and create a backup file that can be sequentially restored to another db in norecovery mode.

    (4) On point (3), I've also heard the term "truncate". Is this a sql command (because I couldn't find it.) Or is it just a term that people use to mean "dumping", "writing", "flushing", etc, the content of ldf to mdf. And are there other commands beside backup log that will do this?

    I have some more questions or things I want to check, after I've gotten some feedback on the above.

    Friday, May 28, 2010 6:22 PM

All replies

  • Hello.

     

    There are three things in play here.

    1. datafiles, typicaly with .mdf or ndf extention.

    2. logifiles, typicaly with .ldf extention.

    3. database cache.

     

    1. the database and log files resides on disk, which compare to memory is extreemly slow. Because of this every RDBMS system uses a portion of memory usually called the db cache to keep reesent accessed data in memory to speed things up.

    2. So data is read into cache, where it is read, modyfied etc... The problem with this approach is if we suddeny had a power failure since memory compared to disk is not resident all our modifications would be gone!

    3. To handle these sorts of situations the "logfile" was added and its function is to make certan that everytime a modification to data happends, enough information to re-create that data is saved to the logfile in case of a power failure.. (this happend everytime there is a commit which usually is auto in SQL Server.

    4. So the cache is there to speed things up, but as we do modifications the cache will be filled with "dirty" data and records to create the modifications will be written to the logs. But to clear the cache once and a while SQL Server will submit a checkpoint which only purpose is to write modified data to the datafiles and free the cache of dirty data. So if you do a full backup, offline youre database etc the firt thing that would happend would be a checkpoint to make certan youre database are in sync.

     

    So.. data is always written from the cache to the datafiles and never from the logfile.

    If an error would occure data will be read into the cache and log record would be applied to that data to make it consistent to the point of failure.

     

    Truncate is a hole nother thing.. The difference between a delete and a truncate is that a delete saves the records to redo or undo the delete in the logfile, while a truncate dosent.. 

    There is a lot more to say but i think this is a good start.

    - Raoul

    • Proposed as answer by Tom Li - MSFT Monday, May 31, 2010 8:50 AM
    Friday, May 28, 2010 8:00 PM
  • (3) There are certain commands that will trigger the stuff in the log to get written to the data portion. For example, 'backup log <db> to disk = <backup file>' will flush the content of the ldf to mdf and create a backup file that can be sequentially restored to another db in norecovery mode.

    We do not flush anything from the log file into the data file. We follow a Write Ahead Logging protocol. An transaction log entry is made before starting a logged operation. All transaction logging is written in the transaction log file (LDF). The data modifications done on the buffers by the transactions are written to the data file(s) (typically MDF and NDFs). When a backup is created for a database, all the contents of the data and log files are written into the backup file. Geoff has explained the backup operation at a high level here:

    http://social.msdn.microsoft.com/Forums/en-US/sqldisasterrecovery/thread/35c03781-e796-4a30-8f4c-9133518e351f

    HTH


    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://troubleshootingsql.wordpress.com
    Twitter: @banerjeeamit
    SQL Server FAQ Blog on MSDN: http://blogs.msdn.com/sqlserverfaq
    • Proposed as answer by Tom Li - MSFT Monday, May 31, 2010 8:49 AM
    Friday, May 28, 2010 9:33 PM
  • Close, but ...

    SQL Server does not really write from the log to the data files.  All work happens in memory cache. Data pages from the disk file(s) are brought into cache to be read and modified. As modifications occur, log records describing those modifications are written to a log cache. When a transaction is committed, all log information for the transaction is written to the log file, so that once a transaction commits, we have enough information to recover it.

    The main time that changed data in cache (which is called DIRTY data) is written to disk is at CHECKPOINT, which happens by default about every minute.  There are other commands that will write data from cache to disk, but CHECKPOINT is the main one, and it is not something you need to execute manually.

    I think your question about TRUNCATE is referring to truncation of the log. You're right, it is not a separate command. Prior to 2008, we could run the command BACKUP LOG WITH TRUNCATE_ONLY, but not in SQL 2008. Now we switch our database to SIMPLE recovery to have the same effect, which is basically to mark all the inactive space in the log file as reusable, so that new transaction records written to disk can overwrite the inactive portions, and the log will not have to grow indefinitely. Normally, your log is truncated every time you backup the log, which is what naturally keeps it to a reasonable size.

    Obviously, there are lots more details about your database files and logging. Someone could write a whole book about it. :-) But please start by reading everything about log management in the SQL Server documentation.


    HTH, Kalen Delaney www.SQLServerInternals.com
    • Proposed as answer by Tom Li - MSFT Monday, May 31, 2010 8:49 AM
    Sunday, May 30, 2010 10:43 PM
  • My technical understanding of disk/cache is a bit foggy. And after reading your reponses, I'm not sure if I need to understand it in detail to understand mdf and ldf, or if any of this is related.

    So let me forget about disk and cache for now, and ask you if the following statment is correct.

    Let's say I have a database with 1 table. This table has 10 records. Then the mdf file has everything that the computer needs to recreate that database. And the ldf file at this point has nothing in it except for the space that has been allocated to it.

    Now, let's say someone executes a DML query to that table. Then whatever the DML query had, that's what's in the ldf? I don't mean the DML statement itself, but whatever the server needs to create the datbase that includes the changes to the data brought by the DML.

    What is in mdf at this point?

     

    Tuesday, June 1, 2010 5:31 PM
  • Doolin,

    Let me take a crack at it.

    If you have a database with one table that has ten records, then all those are in the MDF.  I don't know what you mean by "computer needs to recreate the database", that *is* the database.

    Now, if your DML is a select statement, that's fine, in general nothing happens to the log (LDF).  If your DML is something like "update mytable set foo = 'baz' where foo = 'bar'", then those changes are first written to the LDF, and then written (differently) to the MDF.  Then that command is complete.  There are just ten rows in the MDF.

    The data stays in the MDF indefinitely (until deleted).  When you run a backup, depending on the options chosen, the backup may contain both the MDF contents and the LDF contents.  The LDF is then truncated.

    In short, the MDF *is* the database.

    The LDF logs all substantial changes, and holds them until the log is dumped via some sort of backup or otherwise forcibly truncated, hopefully for good reason.

    Josh

     

    Tuesday, June 1, 2010 6:10 PM
  • Fact:  You have a database with 1 table containing 10 records.

     

    1. As long as you do no modification (DML or DDL) to this table or its data, you are correct that the log file in principal is not needed.

     

    2. When someone wants to access the 10 records contained in the mdf file a copy of the records are read into the buffer cache, this is in memory from where it will be available to users, apps etc.

     

    3. If someone do a modification to any of the 10 records, this is always done in the buffer cache so now you have 2 versions of the same data. The original version still in the .mdf file and the modified version in the cache.

     

    4. To protect the modification to the data in the buffer cache SQL Server will behind the scene save enough information into the log file to be able to re-create it or undo it if never saved through a commit.

     

    Status:

     

    So far we still have the original data ( 10 rows) in the .mdf file, and changed version in the buffer cache called a "dirty page" and the information necessary to re-create or undo this change in the log file.

     

    5. At regular interval SQL server dose something called a checkpoint which purpose is to sync the data in the .mdf file with what ever modification  to be found in the buffer cache (dirty pages.) Once this is complete the .mdf file will once again be up to date and reflect the changes made in the buffer cache.

     

    6. If the modifications had been committed, then once again in principal the portion of the log file protecting the change is not needed anymore.

     

    - Raoul

    • Proposed as answer by Raoul Illyes Friday, July 16, 2010 5:47 AM
    Friday, July 16, 2010 5:47 AM