SQL Server Developer Center > SQL Server Forums > SQL Server Search > Fulltext problem indexing xml data in SQL2008: 0x80004005 errors and cpu usage near 100%
Ask a questionAsk a question
 

QuestionFulltext problem indexing xml data in SQL2008: 0x80004005 errors and cpu usage near 100%

  • Thursday, September 17, 2009 12:28 AMPlanetPaolo Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Hi all,

     

    I have been wrestling this for 2 days now and hope that someone can help.

    FYI - this has been tested on the following environments:

    1.

    Microsoft Windows Server 2003 R2 Standard Edition Service Pack 2

    Microsoft SQL Server 2005 - 9.00.4035.00 (Intel X86)

    Microsoft SQL Server 2008 (SP1) - 10.0.2531.0 (Intel X86)

    2.

    Windows 7 Enterprise

    Microsoft SQL Server 2008 (SP1) - 10.0.2531.0 (X64)

     

    I have a table that stores data in a column of type xml, and a full-text index on it.

    The column contains multilingual data and leverages the xml:lang attribute on a per-record basis to define what word breaker to use.

    Example:

    <Root xml:lang="en-AU">This is my dog.</Root>

    (Note: the actual xml structure is more complex than the above.)

     

    When I went to populate my table with real data, the log filled up with error messages ("Error '0x80004005' occurred during full-text index population for table or indexed view [...]") and the cpu usage skyrocketed to 98%.

    After some digging I was able to identify the culprit - a handful of records having xml:lang="th-TH" (Thai).

    The text within the xml is actually English, but the same records previously existed on a SQL2005 server and were indexed without issues.

    Also, please note that the xml structure is the same across all records, which are in all sorts of languages.

     

    Now, having recently upgraded to SQL2008 I wondered if anything has changed for the Thai language, but found that it's among those languages that have NOT changed (http://msdn.microsoft.com/en-us/library/ms176076.aspx).

    So I played a bit with the xml data to narrow down the scope of the problem and found that 'sometimes' the Thai indexer fails - here is how to reproduce the problem in SQL2008:

     

    1. Create a new table as follows and make [Id] the PK:

    create table [TestTable] ([Id] int not null, [Xml] xml null)

     

    2. Create a full-text index on the [Xml] column (don't specify any language; my sql default is English).

     

    3. Add a new record as follows:

    insert into [TestTable] values (1, '<Root xml:lang="th-TH">blah</Root>')

    -- The above statement does NOT cause any problem: a new row is added, indexed in a split second

    -- and if you check the full-text index you'll see Docs Processed = 1 and Pending Changes = 0

    -- In fact, if you query sys.fulltext_indexes you'll see that the index has a valid crawl_end_date

     

    4. Now add another record as follows:

    insert into [TestTable] values (2, '<Root xml:lang="th-TH"><Title>blah</Title></Root>')

    -- The above statement DOES cause the problem: a new row is added, but the indexing process (fdhost)

    -- will eat up all your cpu and your log (SQLFTxxxxxyyyyy) will fill with 0x80004005 error messages

    -- If you check the full-text index you'll see Fail Count = 1 and Pending Changes = 1

    -- Also, if you query sys.fulltext_indexes you'll see that the crawl_end_date for your index is null

     

    If you try the above in SQL2005 you won't have any problems at all.

    I wonder if anyone has encountered the same problem and/or can shed some light into this issue.

    For now, I'm simply resorting to storing those documents without the xml:lang="th-TH" attribute, but this obviously makes them unsearchable.

     

    Thanks in advance for any help,

    Paolo

All Replies

  • Thursday, September 17, 2009 12:18 PMHilary CotterMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I verified that I am getting the same error.

    The error logged in my fts log is 2009-09-17 08:14:16.75 spid22s     Error '0x80040e97' occurred during full-text index population for table or indexed view '[test].[dbo].[TestTable]' (table or indexed view ID '2105058535', database ID '7'), full-text key value '4'. Attempt will be made to reindex it.





    looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941
  • Thursday, September 17, 2009 11:28 PMPlanetPaolo Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     

    Hi Hilary, thanks for your reply.

    I presume you had the same problem with the cpu usage, too.
    Does this mean that the Thai word breaker (thawbrkr.dll) is misbehaving in a SQL2008 environment, and should I raise this with the MS SQL Search team?

    Thanks again,
    Paolo

  • Friday, September 18, 2009 12:20 AMPlanetPaolo Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Further to my previous post, I checked to see if the dll for SQL2008 is different from the dll for SQL2005.

    They ARE different (at least in my environment):
    - In SQL2005, ThaWBrkr.dll has a modified date of 22/06/2007 and file version 12.0.7822.0
    - In SQL2008, ThaWBrkr.dll has a modified date of 10/07/2008 and file version 12.0.6828.0

    So just to try, I copied the 2005 one into the binn folder of 2008 and repeated my test.
    This has NOT changed the outcome.

    Therefore, I believe it's not to do with the dll itself (since it works fine in SQL2005), but with how it's used in SQL2008.
  • Friday, September 18, 2009 11:25 AMHilary CotterMVPUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Yes, PlantetPaolo, I would open a connect item on this or contact CSS.

    I did get the high cpu spike although it too several minutes for it and I got a different hex value than you. Copying the dll will not work as these are com components and register their classids.
    looking for a book on SQL Server replication? http://www.nwsu.com/0974973602.html looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941
  • Friday, September 25, 2009 4:54 AMPlanetPaolo Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    Sorry about the delay - the following Connect item has been opened:

    https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=492018