locked
Help needed with slow full text search statement RRS feed

  • Question

  • Dear colleagues,

    At one of our customer sites, a particular full text search query is very slow. The customer has a SQL Server 2012 machine. The query is a monster to comprehend since it is generated by the entity framework. I will give you a very simplified version, just to abstract away from the complexity. The query below takes 33 seconds to run.

    SELECT ... FROM ... WHERE CONTAINS([Filter2].[Description], "some text")
    AND [Project2].SendToFinance = 1 ORDER BY [Filter2].[Debtor], [Filter2].[FiscalYear], [Filter2].[CompanyCode] OFSSET 0 ROWS FETCH NEXT 15 ROWS ONLY;

    All indexes are up to date. When I change the CONTAINS to a LIKE, the query is fast. If I use CONTAINS but do a TOP (15) instead of an OFFSET FETCH, the query is fast. If I use CONTAINS but no restriction on the number of records, the query is fast and returns 2500 rows. If I use the query above but only do an ORDER BY [Filter2].[Debtor] the query is fast.

    The full text index contains the columns Debtor, FiscalYear and CompanyCode. I do not see why only ordering on Debtor makes the query run in a couple of ms but adding the other collumns make it take 33 seconds. Everything the query needs is in the index. The same index is used in both cases. The only difference is that if I use the query above, the databse engine thinks it a good idea to take the table aliased as [Project2], filter it on SendToFinance = 1 (which will give you a set of 100,000 records) and do the smarter filtering later. If I only filter on [Debtor], the engine starts by filtering to a set of 2500. You have to know that three levels deeper in the derived tables (I really have a problem interpreting these EF outputs), there is a WHERE clause that boils down to [Filter2].[CompanyCode] = N'XYZ' AND [Filter2].[FiscalYear] = N'2020' (Yes, someone thought it a good idea to implement a year as a string). This means we do not need the ORDER BY [FiscalYear], [CompanyCode], so I have a way to solve the problem at hand. However, I would like to understand why the query becomes so slow if these columns are added to the ORDER BY. Is it because it is an ORDER BY the derived table [Filter2]? Is the query so complex the database engine misses that it could have used the full text index?


    Friday, February 14, 2020 11:54 AM

Answers

  • It took me some time, but the problem is solved. Rebuilding the full text catalog did do the trick. What the ultimate problem was, I do not know. Does anyone know why the full text catalog isn't up to date while every full text index seems to be up to date?

    On the other hand, I didn't understand why the query was slow while every column in the order by was in the full text index. If the full text catalog is corrupt / out of data, things seem to start to make sense again.

    • Marked as answer by Chris Sijtsma Wednesday, February 26, 2020 10:05 AM
    Tuesday, February 25, 2020 1:13 PM

All replies

  • As you probably know, there is no way we can answer your questions without seeing the actual query, and the actual query plans. And if the query is as complex as you suggest, the investigation could take quite some time.

    As you may know the optimizer works with statistics that are sampled about the data, and from this it makes an estimate which is the best plan. On top of all, for a complex query evaluating all possible plans is not an option.

    Thus, the system is far from perfect. It could be that the optimizer runs out of time before it finds a really good plan. It could be that thinks that it has found a fantastic plan, but it was wrong.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Friday, February 14, 2020 11:15 PM
  • I'm sorry Erland. Tomorrow, I will spend some time to clean up the original query and post it. Also, I will add the query plans. I will try to focus on the difference in the two plans. As you can imagine, the plans are rather big. I just hoped that by abstracting away from all the clutter in the Entity Framework generated query, somebody might have an idea. But I should have known better. I will first try to simplify the EF query while still reproducing the problem on the customer's system.

    By the way, my wife and I will visit Sweden this summer. We want to visit Birka and other Viking places of interes. We're ardent reenactors.

    Sunday, February 16, 2020 9:03 AM
  • I'm sorry Erland. Tomorrow, I will spend some time to clean up the original query and post it. Also, I will add the query plans. I will try to focus on the difference in the two plans. As you can imagine, the plans are rather big. I just hoped that by abstracting away from all the clutter in the Entity Framework generated query, somebody might have an idea. But I should have known better. I will first try to simplify the EF query while still reproducing the problem on the customer's system.

    Note that typically the plans are not very practical to post here. It is better to upload them somewhere, like Dropbox, OneDrive etc and post the link. You can also use http://www.brentozar.com/pastetheplan.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Sunday, February 16, 2020 7:18 PM
  • Hi Chris,

    When a specific order is required, SQL Server must test every row. For example, to place rows in Name order, it must sort all rows by Name. 

    It is useful to return less columns (specify the columns you need instead of Select */Top n ) and filter out conditions as many as possible in WHERE clause.

    Best Regards,

    Lily


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com


    • Edited by Lily Lii Monday, February 17, 2020 2:34 AM
    Monday, February 17, 2020 2:32 AM
  • Hi Chris,

    It consumes much test and patience to find the cause of the problem. 

    If you find a solution, it's so kind of you to come back and share it with us. By doing so, it will benefit all community members who are having this similar issue. We'll be waiting for your reply. Your contribution is highly appreciated.

    Best Regards,

    Lily


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Tuesday, February 18, 2020 6:28 AM
  • It took me some time, but the problem is solved. Rebuilding the full text catalog did do the trick. What the ultimate problem was, I do not know. Does anyone know why the full text catalog isn't up to date while every full text index seems to be up to date?

    On the other hand, I didn't understand why the query was slow while every column in the order by was in the full text index. If the full text catalog is corrupt / out of data, things seem to start to make sense again.

    • Marked as answer by Chris Sijtsma Wednesday, February 26, 2020 10:05 AM
    Tuesday, February 25, 2020 1:13 PM
  • hi

    Instead of one index for three columns converted into two indexes

    one for  FiscalYear , Debtor

    and another for FiscalYear ,  CompanyCode

    call also according to the order of index define columns

    Tuesday, February 25, 2020 2:34 PM
  • Hi Laxmidhar sahoo,

    Thanks for your effort. I didn't include the whole query and the table definitions, so you couldn't know that what you propose, is impossible to do. The columns CompanyCode, FiscalYear and Debtor (in that order) are the PK of the underlying table from which the information is filtered. I should have included the whole query. Just to prevent people from spending time on an incomplete question, I will close this thread.

    Thanks again for trying to help.

    Wednesday, February 26, 2020 10:05 AM
  • If you are doing automatic population/crawl the full-text index should be current and offer best performance for an unlimited FTI query. Note that if you use the containstable predicate you can limit the # of results returned by your FTI query which will offer the best performance all around.

    There is a chance that if your table is undergoing significant changes you might benefit from a re-organization (optimize catalog). 

    I do not believe that your rebuild was what solved the problem. It could have read every page into cache which was why you had better performance.  Or there could have been some other event on this table/related tables which gave you better performance after the fti was rebuilt. 

    Its too bad you didn't save the results of set statistics IO on so that we could get some hard data and give you a definitive answer.

    Wednesday, February 26, 2020 7:45 PM