none
Linked server query very slow

    Question

  • Let me start with this was thoroughly tested and worked for several years and now is not.  The data in the databases change daily, but the servers and config have not changed in years.

    We are running 2 instances on same server, running SQL 2005 x64:

    Microsoft SQL Server 2005 - 9.00.4266.00 (X64)   Oct  7 2009 17:38:17   Copyright (c) 1988-2005 Microsoft Corporation  Enterprise Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)

     

    This is a simplistic explanation of what is happening.  I can give more details if needed.

    Instance1 is has only databases which only contain VIEWS which are SELECT * FROM [Instance2].[db].[schema].[table].  All views use the same Linked Server name [Instance2].  This was done for a reason and cannot be changed right now.  Instance2 contains all the actual data.

    In the past, running a query on Instance1 like:

    SELECT ... FROM tablea INNER JOIN tableb....

    would result in a query plan on Instance1 a single query:

    Remote Query = Instance2: SELECT ... FROM tablea INNER JOIN tableb....

    Basically it would pass the entire query to Instance2 and return the result set.

     

    Now all of a sudden, every query on Instance1 is resulting in a query plan of:

    Remote Query = Instance2: SELECT * FROM tablea

    Remote Query = Instance2: SELECT * FROM tableb

    Join Result1 and Result2

     

    Since many of these tables are huge with billions of records, this is causing every query to run extremely poorly.  I have already cleared the plan cache on both servers.  The stats are up to date, etc.

    Does anyone have any ideas why the Linked server query is being split into individual queries against the same instance?  What is the trigger which causes it to split the query into multiple remote queries? 

     


    Tuesday, October 04, 2011 4:58 PM

Answers

  • That would explain it.  If you're calling a function on a field, indexes will not be used and in your case it's apparently not running remotely.  If you're using SQL Server 2008, LIKE can be used instead of SUBSTRING and any indexes on the field1 field would still be utilized.

    WHERE ... AND field1 LIKE ('01%')

    But if you don't need either the SUBSTRING or the LIKE that's even better yet.  Any string conversions or data conversions that can be avoided will speed up your query processing, in general.

     


    Eric Isaacs
    Wednesday, October 05, 2011 9:31 PM
  • I don't know this very well, but I can certainly imagine that the use of a system function prevents remoting of the query.

    As I understand it the DQ engine is very data-source agnostic. That is, there are no shortcuts like "the remote server is an SQL Server, then I can do this". Instead it queries the OLE DB provider about its capabilities. But it is not likely there is a way to confirm whether the remote data source supports substring, so that filter has to be applied locally. Obviously the rest of the query could be run remotely and this particular filter could be applied locally. But depending on how the query looks like, the optimizer may not find any point in this.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, October 05, 2011 10:01 PM

All replies

  • Have you looked at the query plans to see what the estimates are? That is, are they correct? Or does Instance1 think that each table will bring back a single row?

    Here is one thing to check: when you run a query to a linked server, you need to login to that server. The login mapping can be set up so that you login with your own account, but it can also be set up to map to a different account.

    In any case, the account on the remote server must be member of the db_ddladmin fixed database role (or db_owner). If not, the account does not have permissions to run DBCC SHOW_STATISTICS on the tables. If this fails, Instance1 will not get any statistics back, and think that the tables only have a single row, which is very bad when your tables are as big as yours.

    Note that this issue only applies to remote tables. With local tables the optimizer has its own access to the statistics, and does not have to rely on your permissions.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Tuesday, October 04, 2011 10:02 PM
  • I checked the query plan estimated row count.  The query indicates >1 row, but very small numbers, 30 and 17.  In fact, this query returns about 100,000 rows.

    I set the user to have db_ddladmin on the remote tables, cleared the cache and reran the estimated query plan, but got the same plan and results.

     

    Wednesday, October 05, 2011 1:54 PM
  • You should check the estimated count on the individual tables. It is also good if you can run a query and check the actual plan, and compare the actual row count with the estimated count. This can help you pinpoint if there is a certain point in the plan the estimate goes wrong. This could either be in when reading a single table, or in a join.

    If you can find a place to upload the .sqlplan file, post a link so we can look at it.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, October 05, 2011 2:04 PM
  • Also, be sure that the statistics are up to date.

     


    AMB

    Some guidelines for posting questions...

    Wednesday, October 05, 2011 2:08 PM
  • I believe I found the issue, or an explanation.

    On this particular query, the user recently added a WHERE ... AND SUBSTRING(field1,1,2) = '01' to the query recently.  When the SUBSTRING() is removed (which is unneeded as field1 is 2 chars), the entire query is run as a "remote query".

    I will try to research it further.

    Wednesday, October 05, 2011 8:02 PM
  • That would explain it.  If you're calling a function on a field, indexes will not be used and in your case it's apparently not running remotely.  If you're using SQL Server 2008, LIKE can be used instead of SUBSTRING and any indexes on the field1 field would still be utilized.

    WHERE ... AND field1 LIKE ('01%')

    But if you don't need either the SUBSTRING or the LIKE that's even better yet.  Any string conversions or data conversions that can be avoided will speed up your query processing, in general.

     


    Eric Isaacs
    Wednesday, October 05, 2011 9:31 PM
  • I don't know this very well, but I can certainly imagine that the use of a system function prevents remoting of the query.

    As I understand it the DQ engine is very data-source agnostic. That is, there are no shortcuts like "the remote server is an SQL Server, then I can do this". Instead it queries the OLE DB provider about its capabilities. But it is not likely there is a way to confirm whether the remote data source supports substring, so that filter has to be applied locally. Obviously the rest of the query could be run remotely and this particular filter could be applied locally. But depending on how the query looks like, the optimizer may not find any point in this.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, October 05, 2011 10:01 PM
  • It looks like you are correct.  Something in the query triggers it to pull the remote table and do filtering locally.  This is extremely bad in our situation.  I will have to research this further.

    Thanks

     

    Thursday, October 06, 2011 3:20 PM