locked
Indexing error in a database in Sql Server 2008. How to correct? RRS feed

  • Question

  • We have a maintenace plan running at 11.00 PM which does the following
    1)Takes a full backup of the specified databases.
    2)Rebuilds the indexes on the specified databases.
    3)Does the database intgerity check on the specified databases
    4)Deletes the old bak files older than 28 days old.

    When I viewed the plan history I found that the task 2 had not been completed and the plan got terminated at task 2 
    which is rebuilding the indexes.

    I checked the log which stored the result of the maintenance plan and found the following:
    Rebuild Index Task (SRV-SQL-05)
    Rebuild index on Local server connection
    Object: Tables and views
    Original amount of free space
    Task start: 2013-11-09T23:08:04.
    Task end: 2013-11-09T23:08:54.
    Failed:(-1073548784) Executing the query "ALTER INDEX [PK_DBAudit] ON [dbo].[DBAudit] REBUIL..." failed with the following error: 
    "SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x7a7034ba; actual: 0x7a77cb42). 
    It occurred during a read of page (1:132093)
    in database ID 57 at offset 0x000000407fa000 in file 'E:\\MSSQL10_50.MSSQLSERVER\\MSSQL\\DATA\\Followup_2011.mdf'.  
    Additional messages in the SQL Server error log or
    system event log may provide more detail. 
    This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database
    consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
    The statement has been terminated.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, 
    parameters not set correctly, 
    or connection not established correctly.
     
    The said database is in fact is not often used hence I restored the database into the test server from the last backup just before
    reindex had started. I tried to issue the following command to repair the database on the test server with the following script.


    use master
    GO
    ALTER DATABASE Followup_2011
    SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    GO
    DBCC CHECKDB(Followup_2011,repair_rebuild)
    ALTER DATABASE Followup_2011
    SET MULTI_USER 

    When I ran the above script it gave the following errors-:
    Msg 8928, Level 16, State 1, Line 1
    Object ID 1204199340, index ID 1, partition ID 72057596383002624, alloc unit ID 72057596913582080 (type In-row data): 
    Page (1:131976) could not be processed.  See other errors for details.
    The repair level on the DBCC statement caused this repair to be bypassed.
    Msg 8939, Level 16, State 98, Line 1
    Table error: Object ID 1204199340, index ID 1, partition ID 72057596383002624, alloc unit ID 72057596913582080 (type In-row data), 
    page (1:131976). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12584969 and -4.
    Repairing this error requires other errors to be corrected first.
    Msg 8976, Level 16, State 1, Line 1
    Table error: Object ID 1204199340, index ID 1, partition ID 72057596383002624, alloc unit ID 72057596913582080 (type In-row data).
    Page (1:131976) was not seen in the scan although its parent (1:10508) and previous (1:131975) refer to it. Check any previous errors.
    Repairing this error requires other errors to be corrected first.
    Msg 8978, Level 16, State 1, Line 1
    Table error: Object ID 1204199340, index ID 1, partition ID 72057596383002624, alloc unit ID 72057596913582080 (type In-row data). 
    Page (1:131977) is missing a reference from previous page (1:131976). Possible chain linkage problem.
    Repairing this error requires other errors to be corrected first.
    Msg 8928, Level 16, State 1, Line 1
    Object ID 1204199340, index ID 1, partition ID 72057596383002624, alloc unit ID 72057596913582080 (type In-row data): 
    Page (1:132093) could not be processed.  See other errors for details.
     Repairing this error requires other errors to be corrected first.
    Msg 8939, Level 16, State 98, Line 1
    Table error: Object ID 1204199340, index ID 1, partition ID 72057596383002624, alloc unit ID 72057596913582080 (type In-row data), 
    page (1:132093). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12584969 and -4.
    Repairing this error requires other errors to be corrected first.
    Msg 8976, Level 16, State 1, Line 1
    Table error: Object ID 1204199340, index ID 1, partition ID 72057596383002624, alloc unit ID 72057596913582080 (type In-row data). 
    Page (1:132093) was not seen in the scan although its parent (1:10436) and previous (1:132092) refer to it. Check any previous errors.
    Repairing this error requires other errors to be corrected first.
    Msg 8978, Level 16, State 1, Line 1
    Table error: Object ID 1204199340, index ID 1, partition ID 72057596383002624, alloc unit ID 72057596913582080 (type In-row data). 
    Page (1:132094) is missing a reference from previous page (1:132093). Possible chain linkage problem.


    I tried rebuild the index with the following command line-:
    USE [Followup_2011]
    GO
    ALTER INDEX [PK_DBAudit] ON [dbo].[DBAudit] REBUILD PARTITION = ALL WITH ( PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, ONLINE = OFF, SORT_IN_TEMPDB = OFF )
    GO

    The result was the follwing-:
    The statement has been terminated.
    Msg 1088, Level 16, State 9, Line 1
    Cannot find the object "dbo.DBAudit" because it does not exist or you do not have permissions.

    What should I do to correct the above error? Should I restore the database from the previous backup
    where all the tasks in the plan worked like reindexing and database intgerity check. Or is any other
    method to correct the page where the error is pointed to? The total fragmentation of the table is 18.06% 
    and the page fullness is 63.47%. I am also appending the errors that I found in the event viewer as follows-:
    At 8.00 Pm before the maintenance plan ran
    The MsDtsServer100 service was unable to log on as SCPD\zakadmin with the currently configured password due to the following error: 
    Logon failure: unknown user name or bad password.

    To ensure that the service is configured properly, use the Services snap-in in Microsoft Management Console (MMC).

    At8.00 pm
    The SQL Server Integration Services 10.0 service failed to start due to the following error: 
    The service did not start due to a logon failure.

    At 11.00 pm
    DCOM got error "1069" attempting to start the service MsDtsServer100 with arguments "" in order to run the server:
    {46063B1E-BE4A-4014-8755-5B377CD462FC}

    At 11.08 pm
    SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x7a7034ba; actual: 0x7a77cb42).
    It occurred during a read of page (1:132093) in database ID 57 
    at offset 0x000000407fa000 in file 'E:\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\Followup_2011.mdf'.  
    Additional messages in the SQL Server error log or system event log may provide more detail.
    This is a severe error condition that threatens database integrity and must be corrected immediately. 
    Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; 
    for more information, see SQL Server Books Online.

    at 11.10
    Package "DBBackupPlan1_Full" failed.

    Next day at 11.00 am we found another error message
    The driver detected a controller error on \Device\Harddisk9\DR13.

    Could any body please suggest a solution.We had taken a full backup 1 week before also. Should I restore from
    that backup
    Sunday, November 10, 2013 9:50 AM

Answers

  • First of all, you are in the wrong forum. This forum is for security questions related to SQL Server. I've noticed that you have been asking all sorts of questions here, but it's better to ask in the forum which has most relation to your question. In this case the HA & DR forum,
    http://social.msdn.microsoft.com/Forums/sqlserver/en-US/home?forum=sqldisasterrecovery
    I will try to answer to answer you concerns, but the people in the HA/DRR forum, I have much more experience of these issues, and their answers will be more reliable.

    Next, we are not really in the position to say what you should do, because ultimately this is a business decision. It appears from the messages you have, that you run the risk of data loss one way or another. If you run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS, it will throw away pages that it can't piece together. If you restore a backup which are a couple of days old, you may lose recent updates. Then again, if there are transaction log backups, you may be able to recover these as well. Judging from the name and your comment that the database is not used much, I guess this is an archive database, and you may be lucky that there are very few recent changes or any at all.

    That said, I certainly recommend that you first try to find the most recent backup. All other means of repair more difficult and risky. If there are recent changes, you need to recover, the best may be to try to export as much as possible of the data from the current database and merge it into the backup. This requires knowledge of the application rules for the database obviously.

    Finally, restoring the database and reparing the data is not enough. Databases do not go corrupt out of the blue, but there is always a reason, and most often the reason is hardware issues. Indeed, you had this message in the event log:

       The driver detected a controller error on \Device\Harddisk9\DR13.

    Since you had consistency errors from SQL Server before this point, I would suspect that you have more errors of this kind earlier in the event log.

    You need to understand why these errors occur. Until you know that they can be related to power glitches, you need to consider that disk with suspiciousness. If you have more databases on that drive, consider moving them elsewhere, particularly if the databases are critical to the business. If there are problems with the disk, they will strike again and more databases may become corrupted.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Sunday, November 10, 2013 11:31 AM
  • Hello Binny,

    Erland has shown you right direction probably restoring from latest valid backup is best method from recovery.I am always against running repair for database unless you dont have backup .

    Please ask your storage team or SAN vendor to analyze this disk as issue seems in disk .

    HTH


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    Sunday, November 10, 2013 11:43 AM

All replies

  • First of all, you are in the wrong forum. This forum is for security questions related to SQL Server. I've noticed that you have been asking all sorts of questions here, but it's better to ask in the forum which has most relation to your question. In this case the HA & DR forum,
    http://social.msdn.microsoft.com/Forums/sqlserver/en-US/home?forum=sqldisasterrecovery
    I will try to answer to answer you concerns, but the people in the HA/DRR forum, I have much more experience of these issues, and their answers will be more reliable.

    Next, we are not really in the position to say what you should do, because ultimately this is a business decision. It appears from the messages you have, that you run the risk of data loss one way or another. If you run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS, it will throw away pages that it can't piece together. If you restore a backup which are a couple of days old, you may lose recent updates. Then again, if there are transaction log backups, you may be able to recover these as well. Judging from the name and your comment that the database is not used much, I guess this is an archive database, and you may be lucky that there are very few recent changes or any at all.

    That said, I certainly recommend that you first try to find the most recent backup. All other means of repair more difficult and risky. If there are recent changes, you need to recover, the best may be to try to export as much as possible of the data from the current database and merge it into the backup. This requires knowledge of the application rules for the database obviously.

    Finally, restoring the database and reparing the data is not enough. Databases do not go corrupt out of the blue, but there is always a reason, and most often the reason is hardware issues. Indeed, you had this message in the event log:

       The driver detected a controller error on \Device\Harddisk9\DR13.

    Since you had consistency errors from SQL Server before this point, I would suspect that you have more errors of this kind earlier in the event log.

    You need to understand why these errors occur. Until you know that they can be related to power glitches, you need to consider that disk with suspiciousness. If you have more databases on that drive, consider moving them elsewhere, particularly if the databases are critical to the business. If there are problems with the disk, they will strike again and more databases may become corrupted.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Sunday, November 10, 2013 11:31 AM
  • Hello Binny,

    Erland has shown you right direction probably restoring from latest valid backup is best method from recovery.I am always against running repair for database unless you dont have backup .

    Please ask your storage team or SAN vendor to analyze this disk as issue seems in disk .

    HTH


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    Sunday, November 10, 2013 11:43 AM
  • Sunday, November 10, 2013 12:13 PM
  • Thank you for addressing my concerns and also suggesting a more appropriate form

    Binny

    Monday, November 11, 2013 5:30 AM