locked
Create Index on a table with 3 milion of records RRS feed

  • Question

  • Hi

    i am working on a table with 3 milion record and i decide to create index on table and also use table partitioning to partition the table on date(year), i create index and partition in afew way but in all case after i query on table and i compare with old table(without index) it takes longer time to execute and return result.

    i create index in diffrent way :

    1) create 4 file group and partition table to each file group 

    ALTER DATABASE NewSyslog
    ADD FILEGROUP FileGroup1
    
    ALTER DATABASE NewSyslog
    ADD FILEGROUP FileGroup2
    
    ALTER DATABASE NewSyslog
    ADD FILEGROUP FileGroup3
    
    ALTER DATABASE NewSyslog
    ADD FILEGROUP FileGroup4
    
    
    --Create Database File
    ALTER DATABASE NewSyslog
    ADD FILE
    ( NAME = Logs1,
    FILENAME = 'E:\Syslog\logs_filegroup1.ndf',
    SIZE = 1MB
    )
    TO FILEGROUP FileGroup1
    
    GO
    ALTER DATABASE NewSyslog
    ADD FILE
    ( NAME = Logs2,
    FILENAME = 'E:\Syslog\logs_filegroup2.ndf',
    SIZE = 1MB
    )
    TO FILEGROUP FileGroup2
    GO
    ALTER DATABASE NewSyslog
    ADD FILE
    ( NAME = Logs3,
    FILENAME = 'E:\Syslog\logs_filegroup3.ndf',
    SIZE = 1MB
    )
    TO FILEGROUP FileGroup3
    GO
    ALTER DATABASE NewSyslog
    ADD FILE
    ( NAME = Logs4,
    FILENAME = 'E:\Syslog\logs_filegroup4.ndf',
    SIZE = 1MB
    )
    TO FILEGROUP FileGroup4

    then i create partition function and scheme like :

    CREATE PARTITION FUNCTION HitDateRange (datetime)
    AS RANGE LEFT FOR VALUES ('1/1/2013', '1/1/2014', '1/1/2015')
    GO
    
    
    CREATE PARTITION SCHEME HitDateRangeScheme
    AS PARTITION HitDateRange
    TO ( FileGroup1, FileGroup2, FileGroup3, FileGroup4 )

    so i create my table on partiton scheme

    CREATE TABLE [dbo].[Logs](
    	[ID]				[int] IDENTITY(1,1) NOT NULL,
    	[Timestamp]			[datetime] NOT NULL,
    	[SourceIPAddress]	[nvarchar](50) NOT NULL,
    	[FullUrl]			[nvarchar](4000) NOT NULL,
    	[Url]				[nvarchar](512) NOT NULL,
    	[Action]			[nvarchar](10) NOT NULL,
    	[User]				[nvarchar](50) NULL,
    	[TTL]				[int] NULL,
     CONSTRAINT [PK_Logs] 
     PRIMARY KEY CLUSTERED ([ID] ASC,[Timestamp])) 
     ON  [HitDateRangeScheme] ([Timestamp])

    then i insert 3 milion record into table and then test the query performance on this table and the old table, i saw the query run a little faster on old table.

    so i create an index below:

    2)

    Create NonClustered Index NI_User_Timestamp
    On  [dbo].[Logs]([User],[Timestamp])

    the result is slower than old table,

    3) then i create Index Include :

    Create NonClustered Index NI_User_SourceIPAddress_FullUrl_Url_TTL
    On  [dbo].[Logs]([User],[Timestamp])
    Include([SourceIPAddress],[FullUrl],[Url],[TTL])

    then i query table 

    Select	[Timestamp] ,[SourceIPAddress] ,[FullUrl], [Url], [User] ,[TTL]				
    		From	dbo.Logs
    	Where	[Timestamp]	Between	'1/1/2012'	And	'12/29/2013'	
    			And	[User]	=	'bill'

    i test my select query whitout where clause ... but still it is a little slower.

    4)then i recreate the table with Index partitioning like :

    Create NonClustered Index NI_IP_User
    On  [dbo].[Logs]([User])
    On	[HitDateRangeScheme]([Timestamp])

    5) also i partition my table just on primary, means that i did not create file group for each partition,All To ([Primary])

    6) also i did not partition my table, i make clusterd index on ID, and a nonclustered index on [Timestamp],[User]

    Create NonClustered Index NI_Timestamp_User
    On  [dbo].[Logs]([User],[Timestamp])

    7) i create a nonclustered index just on [User] Column

    after all these step still my query on new table is still a little slower.

    so thanks for any help


    Alimardani


    Wednesday, April 3, 2013 6:18 AM

Answers

  • Hi,

    I have had a look at both execution plan and the result is not really a suprise.

    Without index you have a "Clustered Index Scan", with a "Clustered Index Seek"; so the index is used, but not really as intended.

    You wrote, your table contains 3 mio rows; your query Returns 2.1 mio rows, that's nearly the whole table. An index should be "selective" enough to be helpful for query performance; means your filter condition should reduce the result at least to 25% or less then the complete result, otherwise the index isn't helpful and SQL Server decides to run a full table scan.

    And your row size is very high, more then 4 KB; do you need really all columns in your result set? If not, avoid using a SELECT *. Keep in mind, that all data have to be transferred to the Client ( 2 mio rows * 4 KB = 8 GB !!!); the less amount of result, the fast you get the result.


    Olaf Helper

    Blog Xing


    Wednesday, April 3, 2013 4:31 PM

All replies

  • What if you create CLUSTERED INIQUE Index ON 
    PRIMARY KEY CLUSTERED ([Timestamp],[ID]
    )) ON [HitDateRangeScheme] ([Timestamp])

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance

    Wednesday, April 3, 2013 6:36 AM
    Answerer
  • after all these step still my query on new table is still a little slower.

    Hello Alimardani,

    Have you checked the execution plan for your query with/without the index to see, how the query gets processed? For further advice, may could you upload the execution plan as a file to your SkyDrive; then we could have a look at it, too?


    Olaf Helper

    Blog Xing

    Wednesday, April 3, 2013 6:36 AM
  • As Uri suggested, please change your clustered index to Timestamp and then ID. You may even create a non lustered unique index on ID separetely to support those queries using ID as predicate.

    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Wednesday, April 3, 2013 6:42 AM
  • Hello

    thanks for your attention,

    i made change and create primary index as Uri Dimant said

    CREATE TABLE [dbo].[Logs](
    	[ID]				[int] IDENTITY(1,1) NOT NULL,
    	[Timestamp]			[datetime] NOT NULL,
    	[SourceIPAddress]	[nvarchar](50) NOT NULL,
    	[FullUrl]			[nvarchar](4000) NOT NULL,
    	[Url]				[nvarchar](512) NOT NULL,
    	[Action]			[nvarchar](10) NOT NULL,
    	[User]				[nvarchar](50) NULL,
    	[TTL]				[int] NULL,
     CONSTRAINT [PK_Logs] 
     PRIMARY KEY CLUSTERED ([Timestamp],[ID])) 
     ON  [HitDateRangeScheme] ([Timestamp])

    and also i create nonclustered index : 

    Create NonClustered Index NI_User
    On  [dbo].[Logs]([User])

    then i run these query

    Select	*
    		From	dbo.Logs
    	Where	[Timestamp]	Between	'1/1/2012'	And	'12/29/2012'	
    			
    Select	*
    		From	dbo.Logs
    	Where	[Timestamp]	Between	'1/1/2012'	And	'12/29/2012'	
    			And	[User]	=	'bill'

    still query execute a little slower on new table(indexed), then i save execution plan for both query on new table(indexed) and old table, i upload to SkyDrive ,

    thanks for any help


    Alimardani

    Wednesday, April 3, 2013 8:47 AM
  • Hello

    i do some change to table and index as Uri Dimant said, i run a query on both new(indexed) and old table and then i upload Execution Plan to SkyDrive

    thanks


    Alimardani


    Wednesday, April 3, 2013 9:40 AM
  • Hi,

    I have had a look at both execution plan and the result is not really a suprise.

    Without index you have a "Clustered Index Scan", with a "Clustered Index Seek"; so the index is used, but not really as intended.

    You wrote, your table contains 3 mio rows; your query Returns 2.1 mio rows, that's nearly the whole table. An index should be "selective" enough to be helpful for query performance; means your filter condition should reduce the result at least to 25% or less then the complete result, otherwise the index isn't helpful and SQL Server decides to run a full table scan.

    And your row size is very high, more then 4 KB; do you need really all columns in your result set? If not, avoid using a SELECT *. Keep in mind, that all data have to be transferred to the Client ( 2 mio rows * 4 KB = 8 GB !!!); the less amount of result, the fast you get the result.


    Olaf Helper

    Blog Xing


    Wednesday, April 3, 2013 4:31 PM
  • Hello

    so thanks, thanks for your helpfull information,  so how about second query that have another condition , in second query it returns just 23465 row but still slower,

    Select	*
    		From	dbo.Logs
    	Where	[Timestamp]	Between	'1/1/2012'	And	'12/29/2012'	
    			And	[User]	=	'bill'

    so thanks.


    Alimardani

    Thursday, April 4, 2013 4:52 AM
  • Ali

    SQL Server might be using CI seek on partition level....


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Blog: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance

    Thursday, April 4, 2013 5:44 AM
    Answerer
  • I noticed that all your filegroups are on the same physical device, this won't improve performance, and might reduce it since the server will try and perform four simultaneous operations to the same physical device just causing resource contention. No actual i/o capacity is created by doing this.

    Also, have the indexes you created actually been populated?

    Thursday, April 4, 2013 5:50 PM