none
Questions about index RRS feed

  • Question

  • Hi,

    I have a table that contains more than one Millions rows . I have a query like as bellow : 

    Select a,b,c,d,e,f from myTable where f=1 and d in(1,2,3,..)
    

    Actually i have a clustered index on column a this column is primary key and identity and i have nonclustered index on column d 

    When i execute the query and i see the plan execution the optimizer use the clustered index , i don't know why 

    My questions are : 

    • How sql server read my query ? it  will read the whole table in memory then use the filter and then return the result ?
    • Why the Optimizer use the clustered index even if i don't have a filter on column A
    • If i want to perform my query and i want to create new index : It's better to create index on Column F then a new Index on Column D ? Or it's better to create a composite index on Both Column F and D ? What's the difference between Composite index and Alone index ?
    • If i have a composite index there is any difference with this both index  : CREATE NONCLUSTERED INDEX index_1 ON MyTable(F,D)  and CREATE NONCLUSTERED INDEX index_2 ON MyTable(D,F)
    • My last Question : How can i Know if my index that i create work perfectly? and what's the information that i need to focus on the execution plan

    Thanks 

    Friday, August 9, 2019 9:10 AM

All replies

  • Remember that SQL is a declarative language rather that a procedural one. SQL Server uses a cost-based optimization strategy to generate a reasonably efficient execution plan that returns the results declared by the SELECT statement. This costing is largely done by estimating row counts of various candidate execution plans based on available indexes and statistics. Consequently, with non-trivial queries like the one in your question, different indexes may be used depending on the number of rows and data cardinality. It may be less expensive to scan the table than use a non-clustered index based on the estimated row counts.

    Below is an example with the table in your question that shows this behavior. Be aware that there are often trade-offs involved with indexing strategies. An important choice is the optimal clustered index, which might not necessarily be the primary key index. One should consider the overall workload and query mix in determining the best overall indexing strategy. For example, you may find a clustered index on f and d is ideal for this query but queries with singleton lookups by the primary key will be more expensive due to the addition of a key lookup in the plan.

    CREATE TABLE dbo.myTable(
    	  a int NOT NULL IDENTITY
    		CONSTRAINT PK_myTable PRIMARY KEY CLUSTERED
    	, b int NOT NULL
    	, c int NOT NULL
    	, d int NOT NULL INDEX idx_myTable_d
    	, e int NOT NULL
    	, f int NOT NULL
    	);
    GO
    WITH 
    	 t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
    	,t10k AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c CROSS JOIN t10 AS d)
    INSERT INTO dbo.MyTable(b,c,d,e,f)
    SELECT num,num,num,num,num
    FROM t10k
    WHERE num <= 1000;
    GO
    UPDATE STATISTICS dbo.myTable WITH FULLSCAN;
    GO
    SET STATISTICS IO ON;
    GO
    --scan using PK_myTable clustered index
    SELECT a,b,c,d,e,f FROM myTable WHERE f=1 AND d in(1,2,3);
    GO
    --load 9k more rows and update stats
    WITH 
    	 t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
    	,t10k AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c CROSS JOIN t10 AS d)
    INSERT INTO dbo.MyTable(b,c,d,e,f)
    SELECT num,num,num,num,num
    FROM t10k
    WHERE num > 1000;
    UPDATE STATISTICS dbo.myTable WITH FULLSCAN;
    GO
    --seek using idx_myTable_d non-clustered index
    SELECT a,b,c,d,e,f FROM myTable WHERE f=1 AND d in(1,2,3);
    GO
    CREATE INDEX idx_myTable_f_d ON dbo.myTable(f,d);
    GO
    --seek using idx_myTable_f_d non-clustered index
    SELECT a,b,c,d,e,f FROM myTable WHERE f=1 AND d in(1,2,3);
    GO


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    Friday, August 9, 2019 11:17 AM
  • Hi YassirCool,

     

    Thank you for your issue. I'll answer your questions one by one.

     

    ----How sql server read my query ? it  will read the whole table in memory then use the filter and then return the result ?

     

    Knowing the bits and bytes of an SQL query’s order of operations can be very valuable, as it can ease the process of writing new queries, while also being very beneficial when trying to optimize an SQL query. If you’re looking for the short version, this is the logical order of operations, also known as the order of execution, for an SQL query:

    1.FROM, including JOINs

    2.WHERE

    3.GROUP BY

    4.HAVING

    5.WINDOW functions

    6.SELECT

    7.DISTINCT

    8.UNION

    9.ORDER BY

    10.LIMIT and OFFSET

     

    For more information , please refer to Order Of Execution of the SQL query.

     

    ----Why the Optimizer use the clustered index even if i don't have a filter on column A

     

    Firstly, please understand following information. This is equivalent to a table scan.

     

    I think  in your execution plan, it will use clustered indexes scan.

     

    Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be stored in only one order. The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.

     

    Just because you create a clustered index and then change the storage structure of a table, table scan becomes clustered index scan.

     

    ----If i want to perform my query and i want to create new index : It's better to create index on Column F then a new Index on Column D ? Or it's better to create a composite index on Both Column F and D ? What's the difference between Composite index and Alone index ?

     

    I think it depends on your actual data. I'm not sure which index is better, but you can try it out with your data.

     

    ----If i have a composite index there is any difference with this both index  : CREATE NONCLUSTERED INDEX index_1 ON MyTable(F,D)  and CREATE NONCLUSTERED INDEX index_2 ON MyTable(D,F)

     

    I also think it depends on your actual data. There may be performance differences between the two indexes. Please refer to  How important is the order of columns in indexes?, it will explain in detail the importance and differences.

     

    ----My last Question : How can i Know if my index that i create work perfectly? and what's the information that i need to focus on the execution plan

     

    As I know, please check your execution plan. Index seek might be efficient, and index scan might not. For more information , please refer to SQL SERVER – Index Seek Vs. Index Scan (Table Scan).

     

    By the way, you can 'SET STATISTICS IO ON; ' to display information regarding the amount of disk activity generated. Please compare the messages and you will see the difference. For more details , you can refer to it: https://docs.microsoft.com/en-us/sql/t-sql/statements/set-statistics-io-transact-sql?view=sql-server-2017

     

    Hope it will help you.

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, August 12, 2019 8:05 AM
  • Hi YassirCool,,

    I am writing to follow up this thread with you. Have you solved your issue? If you have solved your issue ,in order to close this thread, please kindly mark helpful replies as answers. By doing so, it will benefit all community members who are having this similar issue.  Your contribution is highly appreciated.

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, August 14, 2019 9:31 AM