locked
Will there be a benefit of adding extra indexes RRS feed

  • Question

  • Hi everybody,

    We have a wide Items table with 251 columns where 12 columns are of type text. This table has ~ 15000 rows and new items are not added frequently, although updates happen very frequently.

    There are several queries that are executed very often on this table, such as 

    select department, category, item, descrip, SPACE(25) as m_descrip, 000000 as invent_id              FROM items              WHERE Invent_ID = 196300010001

    The items table does not have the corresponding indexes so an index scan is required. The primary key (clustered index) for this table is a compound key

    based on department, category, item. (all CHAR(10) fields).

    --------------

    Do you think I should recommend adding separate indexes for this table such as index on barcode field, index on invent_id field and few others which are searched very frequently in the application?

    Thanks in advance.

     


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Monday, September 19, 2011 7:07 PM
    Answerer

Answers

  • Hello Naomi

    I would go for the indexes as well, being the table bad normalized and with lot of big fields few small good-calculated indexes more would not change that much in writing, whereas they could boost the reads, expecially if you also use those fields in joins. Perphaps keep a low fill factor if you are concerned with writing. I also would not particulary bother about fragmentation if they do mostly seeks.

    I endorse Harshvai's points as well.

    Regards,

             Marco

     



    • Edited by Marco Carozzi Monday, September 19, 2011 10:57 PM
    • Marked as answer by Naomi NEditor Thursday, September 22, 2011 3:50 AM
    Monday, September 19, 2011 10:55 PM
  • Hi Naomi,

    I'd also suggest to go with indexes. Missing indexes DMV could be useful (but of course, those would not help with index maintenance overhead) but on the other hand, I'd probably analyze most IO expensive queries based on query stats DMV and decide what is the best index structure by myself.

    One other thing - if you don't see any queries that benefit from the current clustered index, I'd also evaluate an option to remove the clustered index and make the table the heap one. Even if it's not the best practice, it could be beneficial to store RID rather than wide clustered index value in non-clustered indexes rows. But obviously this is something you need to check/test.


    Thank you!

    My blog: http://aboutsqlserver.com

    • Marked as answer by Naomi NEditor Thursday, September 22, 2011 3:55 AM
    Tuesday, September 20, 2011 7:58 PM
  • Hi Naomi,

    You can start with the trace - sort by duration  (don't forget about possible blocking that affect duration) or perhaps reads/writes and see the plans. On the other hand, I'd start with something like that:

    SELECT TOP 50 
    	SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1,
    		((
    			CASE qs.statement_end_offset
    				WHEN -1 THEN DATALENGTH(qt.TEXT)
    				ELSE qs.statement_end_offset
    			END - qs.statement_start_offset)/2)+1),
    	qs.execution_count,
    	(qs.total_logical_reads + qs.total_logical_writes) / qs.execution_count as [Avg IO],
    	qp.query_plan,
    	qs.total_logical_reads, qs.last_logical_reads,
    	qs.total_logical_writes, qs.last_logical_writes,
    	qs.total_worker_time,
    	qs.last_worker_time,
    	qs.total_elapsed_time/1000 total_elapsed_time_in_ms,
    	qs.last_elapsed_time/1000 last_elapsed_time_in_ms,
    	qs.last_execution_time
    FROM 
    	sys.dm_exec_query_stats qs
    		CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
    		OUTER APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
    ORDER BY 
    	[Avg IO] DESC
    go	
    

    Filter sql text by the table name, find out what queries are most expensive and see the plans. And create indexes.

    One thing to keep in mind that it returns data from proc cache - so there is a chance that not everything is there.


    Thank you!

    My blog: http://aboutsqlserver.com


    Tuesday, September 20, 2011 11:21 PM
  • I would evaluate the DMVs to find out which combinations are used often. From the query above the index should be only put on Invent_ID. 2000 rows is not that much though, but if you join the data frequently there could be a performance benefit depending on your workload.

    -Jens


    Jens K. Suessmeyer http://blogs.msdn.com/Jenss
    Monday, September 19, 2011 7:29 PM
  • I think the data is very less to have indexes on - what i mean to say is the difference in searching in indexes for the data you queried might be same as having table scan itself..

    Also due to frequent updates your indexs will get fragments and also updates will be slower.

    Also one more factor  to decide here -> how many distinct data is for that column - say for 15000 rows if more than 70-100% of data is different the indexes will only boost a lil , but if the data is highly repetative like only 10-30% of distinct rows then having index will help.

    Monday, September 19, 2011 7:38 PM

All replies

  • I would evaluate the DMVs to find out which combinations are used often. From the query above the index should be only put on Invent_ID. 2000 rows is not that much though, but if you join the data frequently there could be a performance benefit depending on your workload.

    -Jens


    Jens K. Suessmeyer http://blogs.msdn.com/Jenss
    Monday, September 19, 2011 7:29 PM
  • I think the data is very less to have indexes on - what i mean to say is the difference in searching in indexes for the data you queried might be same as having table scan itself..

    Also due to frequent updates your indexs will get fragments and also updates will be slower.

    Also one more factor  to decide here -> how many distinct data is for that column - say for 15000 rows if more than 70-100% of data is different the indexes will only boost a lil , but if the data is highly repetative like only 10-30% of distinct rows then having index will help.

    Monday, September 19, 2011 7:38 PM
  • This reminded me on one situation we saw few months back...wanted to share since it was a very funny one -- altough it was on oracle .. one of the group had a table with more than 40 columns which was a reporting table .. over the period of time they added indexes on each column.. and then the size of the indexes on the disc grew twice than the table size itself :D ... so scanning the table without index was faster than that table with indexes :D 
    Monday, September 19, 2011 7:44 PM
  • Hi Jens,

    Can you elaborate a bit? I looked closer and there are actually ~15K rows in that table. The queries are based on invent_id, barcode, UPC fields from what I see in the trace file. Currently these queries perform an index scan.

    The table (Items) is used in many queries often as part of the JOINs. In the particular routine I analyze (and which is very often called) it is queries stand-alone based on either of the fields (actually, it's done in a loop - it first tries to search by invent_id, then barcode, then UPC, then few other fields for this case value).


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Monday, September 19, 2011 8:09 PM
    Answerer
  • That actually may be a case here - as I said, the table is very wide right now and badly normalized. The queries by barcode, by invent_id and few other columns are very frequent, so in my opinion we do need these indexes. 
    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog
    Monday, September 19, 2011 8:12 PM
    Answerer
  • Hello Naomi

    I would go for the indexes as well, being the table bad normalized and with lot of big fields few small good-calculated indexes more would not change that much in writing, whereas they could boost the reads, expecially if you also use those fields in joins. Perphaps keep a low fill factor if you are concerned with writing. I also would not particulary bother about fragmentation if they do mostly seeks.

    I endorse Harshvai's points as well.

    Regards,

             Marco

     



    • Edited by Marco Carozzi Monday, September 19, 2011 10:57 PM
    • Marked as answer by Naomi NEditor Thursday, September 22, 2011 3:50 AM
    Monday, September 19, 2011 10:55 PM
  • I sent a suggestion to one client along with some code improvements I made in one application and another suggestion. We'll see if they will be able to implement that suggestion and will it provide the performance improvement or not.
    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog
    Monday, September 19, 2011 10:58 PM
    Answerer
  • I will also see to get rid of texts, which are by the way deprecated, otherwise you could consider placing them on TEXTIMAGE. It could do also some good to place those columns on a sided table, and perapsh implement compression if you can (there are limits on row size so you must get rid of texts for this). If the queries against that table are punctual, and the environment can stand it, I would value to run in parallel, if you have many cores at disposition perahps in this case a scan could still be better overall.

    Regards,

           Marco

     


    Monday, September 19, 2011 11:14 PM
  • Hi Naomi,

    I'd also suggest to go with indexes. Missing indexes DMV could be useful (but of course, those would not help with index maintenance overhead) but on the other hand, I'd probably analyze most IO expensive queries based on query stats DMV and decide what is the best index structure by myself.

    One other thing - if you don't see any queries that benefit from the current clustered index, I'd also evaluate an option to remove the clustered index and make the table the heap one. Even if it's not the best practice, it could be beneficial to store RID rather than wide clustered index value in non-clustered indexes rows. But obviously this is something you need to check/test.


    Thank you!

    My blog: http://aboutsqlserver.com

    • Marked as answer by Naomi NEditor Thursday, September 22, 2011 3:55 AM
    Tuesday, September 20, 2011 7:58 PM
  • There are many queries on this table and involving this table that utilize the clustered index, e.g.

    select t.fields, i.Description from transact t inner join items i on t.department = i.department and t.category = i.category and t.item = i.item


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog
    Tuesday, September 20, 2011 8:11 PM
    Answerer
  • Well, then keep it unless you can use covered NCI. :)

    Thank you!

    My blog: http://aboutsqlserver.com

    Tuesday, September 20, 2011 8:14 PM
  • Hi Naomi

    what do you exactly mean by last question ?

    You want to go trougth the trace taken in peak time client sent you and see what you have? What kind of information are there ? Plans as well ?

    Regards,

                Marco

     

     

    Tuesday, September 20, 2011 10:10 PM
  • Hi Naomi,

    You can start with the trace - sort by duration  (don't forget about possible blocking that affect duration) or perhaps reads/writes and see the plans. On the other hand, I'd start with something like that:

    SELECT TOP 50 
    	SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1,
    		((
    			CASE qs.statement_end_offset
    				WHEN -1 THEN DATALENGTH(qt.TEXT)
    				ELSE qs.statement_end_offset
    			END - qs.statement_start_offset)/2)+1),
    	qs.execution_count,
    	(qs.total_logical_reads + qs.total_logical_writes) / qs.execution_count as [Avg IO],
    	qp.query_plan,
    	qs.total_logical_reads, qs.last_logical_reads,
    	qs.total_logical_writes, qs.last_logical_writes,
    	qs.total_worker_time,
    	qs.last_worker_time,
    	qs.total_elapsed_time/1000 total_elapsed_time_in_ms,
    	qs.last_elapsed_time/1000 last_elapsed_time_in_ms,
    	qs.last_execution_time
    FROM 
    	sys.dm_exec_query_stats qs
    		CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
    		OUTER APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
    ORDER BY 
    	[Avg IO] DESC
    go	
    

    Filter sql text by the table name, find out what queries are most expensive and see the plans. And create indexes.

    One thing to keep in mind that it returns data from proc cache - so there is a chance that not everything is there.


    Thank you!

    My blog: http://aboutsqlserver.com


    Tuesday, September 20, 2011 11:21 PM
  • Right now I'm working on trying to identify where the slowness comes from. I did add the indexes to the items table and did some other code improvements, but I don't see much of the performance boost by analyzing the log files from our applications.

    This particular application only does scanning (retrieving items) and inserting records into a single table. Of course, there is other work going on on the server at the same time. I am not yet finding what the bottleneck may be.


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Wednesday, September 21, 2011 2:39 AM
    Answerer
  • Hi Naomi

    could you get the worst queries and execute them with statistics io and time on ? (And post here results, and queries)

    Do you run in parallel or serial ? Could you post the query plan ? Do you have query predictability or not ?

    Are your data and indexes properly contiguos or do you suffer high fragmentation ?

    What about statistics, are them properly created (multi column as well if needed) and updated ? Could you run sort and hash warnings via sql profiler to see if you spill to temp db ? Could you check the work load against DTA and see if there are any good suggestion in indexing or about statistics ? (remember auto created statistics will be single column only)

    Are your concerns maily directed to reading or writing ?

    Do you suffer lock contention ? Do you run in explicit or implicit transaction ? Which isolation level are you on ? Do you suffer page splits in writing ?

    Do you have forced query hints ?

     

    some general questions... then we can narrow down to a more precise approach....

    Regards,

              Marco

    Wednesday, September 21, 2011 9:26 AM
  • I added the indexes and in the trace file I see that the performance of these queries greatly improved. 

    However, I believe I may want to re-write this application and then these indexes will be not that useful and we probably will need to get rid of them.

    Right now every scan does multiple searches by different fields until the item is found. Each search is a round-trip to the server to get the record.

    My idea - re-write this application to get all items at once (and refresh them with certain frequency) and perform searches locally with local data. I think it should provide much more speed and also removes burden from the server. 


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog
    Thursday, September 22, 2011 3:54 AM
    Answerer