none
Fastest way to SELECT or INSERT a record and returning an ID RRS feed

  • Question

  • I'm trying to figure out what is the fastest way to select or insert a row in a table by returning the corresponding ID in a scenario where multiple threads run on the same table. In other words, each thread should insert the row if it does not exist or select its id if it exists; in any case, the ID have to be returned.

    Currently, I'm working on this table: 

    DROP TABLE IF EXISTS [dbo].[soi_test]
    GO
    
    CREATE TABLE [dbo].[soi_test](
    	[id] [int] IDENTITY(1,1) NOT NULL,
    	[type] [int] NOT NULL,
     CONSTRAINT [PK_soi_test] PRIMARY KEY CLUSTERED 
    (
    	[id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

    And I'm using this script to test the performances by using multiple SSMS windows (as suggest by Dan Guzman on another question):

    TRUNCATE TABLE soi_test;
    
    DECLARE @ok INT = 0
    	,@error INT = 0
    	,@select INT = 0
    	,@insert INT = 0
    	,@loop INT = 0;
    
    WAITFOR TIME '14:29:00';
    
    SELECT GETDATE();
    
    WHILE (@loop < 100000)
    BEGIN
    	DECLARE @rnd INT = ABS(CHECKSUM(NEWID()) % 1000);
    	DECLARE @id INT = (
    			SELECT TOP (1) id
    			FROM soi_test
    			WHERE (type = @rnd)
    			);
    
    	IF @id IS NULL
    	BEGIN
    		SET IDENTITY_INSERT soi_test ON;
    
    		BEGIN TRY
    			SET NOCOUNT ON;
    			SET XACT_ABORT ON;
    
    			BEGIN TRANSACTION
    
    			INSERT INTO soi_test (
    					id
    					,type
    					)
    			VALUES (
    				@rnd
    				,@rnd
    				);
    
    			COMMIT TRANSACTION
    
    			SET @insert += 1;
    			
    			SELECT @id = SCOPE_IDENTITY();
    		END TRY
    
    		BEGIN CATCH
    			ROLLBACK TRANSACTION
    
    			SET @select += 1;
    
    			SELECT @id = (
    					SELECT TOP (1) id
    					FROM soi_test
    					WHERE (type = @rnd)
    					);
    		END CATCH
    
    		SET IDENTITY_INSERT soi_test OFF;
    	END;
    
    	IF @id = @rnd
    		SET @ok += 1;
    	ELSE
    		SET @error += 1;
    
    	SET @loop += 1;
    END;
    
    SELECT GETDATE();
    
    SELECT @ok AS 'ok'
    	,@error AS 'error'
    	,@select AS 'select'
    	,@insert AS 'insert';

    To verify the correctness of the entered data I made sure that the id and the type in the table were equal.

    Thanks in advance for any help


    Wednesday, September 18, 2019 12:46 PM

All replies

  • I'm trying to figure out what is the fastest way to select or insert a row in a table by returning the corresponding ID in a scenario where multiple threads run on the same table. In other words, each thread should insert the row if it does not exist or select its id if it exists; in any case, the ID have to be returned.

    Good day,

    I am not sure if I got your requirement fully but the base is to use MERGE query or INSERT FROM SELECT

    DECLARE @Value_2_Insert int = 4435
    -- First we return the row if it is exists
    SELECT id,[type] FROM [soi_test] where id = @Value_2_Insert
    -- second we insert the new row if it was not exists
    INSERT [soi_test] (id,[type])
    	SELECT @Value_2_Insert, @Value_2_Insert
    	WHERE NOT EXISTS (SELECT id FROM [soi_test] where id = @Value_2_Insert)
    GO

    Or maybe you meant something like:

    DECLARE @Value_2_Insert int = 454
    IF (EXISTS (SELECT id FROM [soi_test] where id = @Value_2_Insert))
    	SELECT @Value_2_Insert
    ELSE 
    	INSERT [soi_test] (id,[type])
    		SELECT @Value_2_Insert, @Value_2_Insert
    		--WHERE NOT EXISTS (SELECT id FROM [soi_test] where id = @Value_2_Insert)
    GO

     


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]


    Sunday, September 22, 2019 2:22 AM
    Moderator
  • Read Tony's blog

    https://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/07/03/855.aspx


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Sunday, September 22, 2019 5:14 AM
    Answerer
  • I'm trying to figure out what is the fastest way to select or insert a row in a table by returning the corresponding ID in a scenario where multiple threads run on the same table. In other words, each thread should insert the row if it does not exist or select its id if it exists; in any case, the ID have to be returned.

    Good day,

    I am not sure if I got your requirement fully but the base is to use MERGE query or INSERT FROM SELECT

    DECLARE @Value_2_Insert int = 4435
    -- First we return the row if it is exists
    SELECT id,[type] FROM [soi_test] where id = @Value_2_Insert
    -- second we insert the new row if it was not exists
    INSERT [soi_test] (id,[type])
    	SELECT @Value_2_Insert, @Value_2_Insert
    	WHERE NOT EXISTS (SELECT id FROM [soi_test] where id = @Value_2_Insert)
    GO

    Or maybe you meant something like:

    DECLARE @Value_2_Insert int = 454
    IF (EXISTS (SELECT id FROM [soi_test] where id = @Value_2_Insert))
    	SELECT @Value_2_Insert
    ELSE 
    	INSERT [soi_test] (id,[type])
    		SELECT @Value_2_Insert, @Value_2_Insert
    		--WHERE NOT EXISTS (SELECT id FROM [soi_test] where id = @Value_2_Insert)
    GO

     


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]


    Thanks for the reply.

    Actually, I don't know the ID, I need the ID. In my scenario, more queries are performed on the same table, each query should return an ID, after entering the record or reading the ID of a previously entered record.
    Sunday, September 22, 2019 1:01 PM
  • Read Tony's blog

    https://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/07/03/855.aspx


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Thanks for the reply.

    It's an interesting approach, I have to prove it, but I suspect there are performance problems.


    Sunday, September 22, 2019 1:02 PM
  • Read Tony's blog

    https://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/07/03/855.aspx


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    After having tested this approach, I can say that:
    - as proposed, it has some problems because it does not provide a retry policy and mechanisms to check inserts
    - in a modified version with the retry, it has a 10% advantage in speed, but a structural complexity that does not make me lean towards using it


    • Edited by dsmwb Sunday, September 22, 2019 1:56 PM
    Sunday, September 22, 2019 1:56 PM
  • If the data are never deleted, then maybe consider the viability of this approach too:

    • Introduce a constraint or index that excludes duplicates.
    • Each thread will execute an INSERT and will return the ID (e.g. autogenerated SCOPE_IDENTITY()) in case of success.
    • But if the record already exists, the error will be intercepted by a CATCH statement. The ID of existing record can be obtained with a SELECT.
    Sunday, September 22, 2019 5:58 PM
  • If the data are never deleted, then maybe consider the viability of this approach too:

    • Introduce a constraint or index that excludes duplicates.
    • Each thread will execute an INSERT and will return the ID (e.g. autogenerated SCOPE_IDENTITY()) in case of success.
    • But if the record already exists, the error will be intercepted by a CATCH statement. The ID of existing record can be obtained with a SELECT.
    Thanks for the reply, I just tested it. It takes 150% more time than my solution, I think because in the worst case there is an insert and a select.
    Sunday, September 22, 2019 7:37 PM
  • Hi dsmwb,

     

    I found that in your code, you used a WHILE loop, which I think makes you very inefficient.

     

    Also, this code inserts the first data with a random id. The id in this table is the primary key. I suggest you try the delete index while executing.

     

    However, I think it may not be able to change anything. WHILE provides a bad performance

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, September 25, 2019 2:27 AM
  • Maybe I missed it - but I am not seeing code to insert a new row and return the ID or select an ID if it already exists.  What I see is a test script to test inserting a set of random rows into the table.

    If the goal is to either insert a new row - or select the existing ID then you have to provide some other value(s) to determine that the row already exists.  If the row exists based on these values - return the ID.

    I can think of 2 approaches - first is to check for existence, the second is to assume it doesn't exist:

    DECLARE @idValue int = (SELECT id FROM table WHERE {conditions});
    
    IF @idValue IS NULL
    BEGIN
        INSERT INTO ...
        -- Use OUTPUT on the INSERT statement to capture ID generated from the insert statement
    END
    
    
    DECLARE @idValue int;
    
    BEGIN TRY
        INSERT INTO table ...
        OUTPUT ...
    END TRY
    BEGIN CATCH
        --check for insert error indicating values already exist
        IF (error = row exists)
           SET @idValue = (SELECT ID FROM table WHERE {conditions})
    END CATCH

    The first approach attempts to retrieve the ID - if found it is done.  If not found - it drops into an INSERT...

    The second approach attempts to insert a new row - if successful, returns the ID in the OUTPUT clause.  If not successful an error should be generated (PK or UK violation - duplicate row, etc...) - the error moves to the catch block which selects the ID value.

    In both cases - since this is a single transaction each thread will either perform an insert or return the selected row.  The second approach, however - will probably generate an ID value for the failed insert which will then be skipped.


    Jeff Williams

    Wednesday, September 25, 2019 9:05 PM
  • Hi dsmwb,

     

    I found that in your code, you used a WHILE loop, which I think makes you very inefficient.

     

    Also, this code inserts the first data with a random id. The id in this table is the primary key. I suggest you try the delete index while executing.

     

    However, I think it may not be able to change anything. WHILE provides a bad performance

     

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    thanks for the reply, the WHILE loop and the random ID are only used for testing.

    Friday, September 27, 2019 10:37 AM
  • thanks for the reply,

    Maybe I missed it - but I am not seeing code to insert a new row and return the ID or select an ID if it already exists.  What I see is a test script to test inserting a set of random rows into the table.

    The core code that inserts or selects is this (I commented to clarify):

    -- returns the ID if the record exists
    DECLARE @id INT = (
    		SELECT TOP (1) id
    		FROM soi_test
    		WHERE (type = @rnd)
    		);
    
    IF @id IS NULL
    BEGIN
    	BEGIN TRY
    		SET NOCOUNT ON;
    		SET XACT_ABORT ON;
    
    		BEGIN TRANSACTION
    
    		-- tries to insert the ID if the record does NOT exist
    		INSERT INTO soi_test (
    				id
    				,type
    				)
    		VALUES (
    			@rnd
    			,@rnd
    			);
    
    		COMMIT TRANSACTION
    		
    		-- gets the last inserted ID
    		SELECT @id = SCOPE_IDENTITY();
    	END TRY
    
    	BEGIN CATCH
    		ROLLBACK TRANSACTION
    
    		-- when an error occurs it tries to return the ID
    		SELECT @id = (
    			SELECT TOP (1) id
    			FROM soi_test
    			WHERE (type = @rnd)
    			);
    	END CATCH
    END;

    If the goal is to either insert a new row - or select the existing ID then you have to provide some other value(s) to determine that the row already exists.  If the row exists based on these values - return the ID.

    I can think of 2 approaches - first is to check for existence, the second is to assume it doesn't exist:

    DECLARE @idValue int = (SELECT id FROM table WHERE {conditions});
    
    IF @idValue IS NULL
    BEGIN
        INSERT INTO ...
        -- Use OUTPUT on the INSERT statement to capture ID generated from the insert statement
    END
    DECLARE @idValue int;
    
    BEGIN TRY
        INSERT INTO table ...
        OUTPUT ...
    END TRY
    BEGIN CATCH
        --check for insert error indicating values already exist
        IF (error = row exists)
           SET @idValue = (SELECT ID FROM table WHERE {conditions})
    END CATCH

    The first approach attempts to retrieve the ID - if found it is done.  If not found - it drops into an INSERT...

    The second approach attempts to insert a new row - if successful, returns the ID in the OUTPUT clause.  If not successful an error should be generated (PK or UK violation - duplicate row, etc...) - the error moves to the catch block which selects the ID value.

    In both cases - since this is a single transaction each thread will either perform an insert or return the selected row.  The second approach, however - will probably generate an ID value for the failed insert which will then be skipped.


    Jeff Williams

    the first approach you propose is similar to mine, but has a fundamental error, in a multi-thread scenario, you must assume that the INSERT can fail due to the insertion of the same record by another thread.

    The second approach has already been proposed,

    it takes 150% more time than my solution, I think because in the worst case there is an insert and a select.






    • Edited by dsmwb Friday, September 27, 2019 12:04 PM
    Friday, September 27, 2019 10:48 AM
  • Your ID value is an IDENTITY which is thread safe and will not generate the same ID value for separate threads.

    The assumption you have made that the INSERT must fail is incorrect - the INSERT will only fail across threads if *some other constraint* is violated.  Since you have no other constraints on the table - failure during INSERT cannot happen due to a duplicate ID value.

    The INSERT statement will generate a new IDENTITY value every insert attempt.  If the insert fails - that IDENTITY value will be discarded and will not be used by any other thread.

    The problem with your code is that you are not using the IDENTITY property to generate the value.  You are attempting to insert into the table the randomly generated 'type'.

    Remove the SET IDENTITY_INSERT and remove the ID from the insert statement and the IDENTITY value will be generated in a thread-safe manner and no duplicates will occur.

    You can also make the type column a unique key - and then change the lookup to:

    SELECT ID FROM soi_test WHERE type = @rnd;

    If the generation of @rnd produces a duplicate it will select the existing row and since that row cannot have a duplicate value the TOP 1 isn't necessary - and if you attempt to insert with an existing @rnd value it will fail...forcing a lookup.

    With that - then using either EXISTS or TRY/CATCH will work the same.


    Jeff Williams

    Saturday, September 28, 2019 4:21 PM
  • Your ID value is an IDENTITY which is thread safe and will not generate the same ID value for separate threads.

    To avoid this the ID is always random (for the test) and it is equal to type just to check their values.

    The assumption you have made that the INSERT must fail is incorrect - the INSERT will only fail across threads if *some other constraint* is violated.  Since you have no other constraints on the table - failure during INSERT cannot happen due to a duplicate ID value.

    For this reason there is the SET IDENTITY_INSERT. The insert could fail for the value of type, not for the value of ID that is actually auto-incremented (in a real scenario). Please note that in the example the primary key is type and not id. 



    • Edited by dsmwb Sunday, September 29, 2019 8:57 PM
    Sunday, September 29, 2019 8:52 PM
  • Without your actual case, we may not be able to give you effective advice. We've done a series of instructions for your test case, but it doesn't seem to help. I suggest you give us more information.

    Best Regards,

    Rachel 


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, September 30, 2019 7:52 AM
  • Your ID value is an IDENTITY which is thread safe and will not generate the same ID value for separate threads.

    To avoid this the ID is always random (for the test) and it is equal to type just to check their values.

    The assumption you have made that the INSERT must fail is incorrect - the INSERT will only fail across threads if *some other constraint* is violated.  Since you have no other constraints on the table - failure during INSERT cannot happen due to a duplicate ID value.

    For this reason there is the SET IDENTITY_INSERT. The insert could fail for the value of type, not for the value of ID that is actually auto-incremented (in a real scenario). Please note that in the example the primary key is type and not id. 



    Using 'SET IDENTITY_INSERT soi_test ON' allows you to insert the 'identity' value.  This is not enabling the IDENTITY...

    What you are doing is generating a value and inserting that value into the table - and that is where your problem lies...

    If you allowed SQL Server to generate the identity this process would be thread-safe and you would not have any issues.

    INSERT INTO soi_test (type) VALUES (@rnd);

    If you use this to insert - SQL Server will generate a value for the ID and will never generate a duplicate value.

    SELECT ID FROM soi_test WHERE type = @rnd;

    This will now select the generated ID - and if not found you can then fall into the insert portion and create a new record.

    This will be thread-safe regardless of how many concurrent sessions call the code.


    Jeff Williams

    Monday, September 30, 2019 5:27 PM
  • Without your actual case, we may not be able to give you effective advice. We've done a series of instructions for your test case, but it doesn't seem to help. I suggest you give us more information.

    Best Regards,

    Rachel 


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thanks for the answer, the real case is not important, the point is to find the fastest way to select or insert a row in a table when multiple threads access it.
    Wednesday, October 2, 2019 5:37 PM
  • Using 'SET IDENTITY_INSERT soi_test ON' allows you to insert the 'identity' value.  This is not enabling the IDENTITY...

    What you are doing is generating a value and inserting that value into the table - and that is where your problem lies...

    If you allowed SQL Server to generate the identity this process would be thread-safe and you would not have any issues.

    INSERT INTO soi_test (type) VALUES (@rnd);
    The identity insert is only used to check if the entered value is correct since in this case the ID is always equal to TYPE. We can reason with the ID automatically managed by the SQL server, but in this case you have to think of a unique index on TYPE since ID=TYPE means that TYPE must be unique.
    Wednesday, October 2, 2019 5:49 PM
  • Using 'SET IDENTITY_INSERT soi_test ON' allows you to insert the 'identity' value.  This is not enabling the IDENTITY...

    What you are doing is generating a value and inserting that value into the table - and that is where your problem lies...

    If you allowed SQL Server to generate the identity this process would be thread-safe and you would not have any issues.

    INSERT INTO soi_test (type) VALUES (@rnd);

    The identity insert is only used to check if the entered value is correct since in this case the ID is always equal to TYPE. We can reason with the ID automatically managed by the SQL server, but in this case you have to think of a unique index on TYPE since ID=TYPE means that TYPE must be unique.

    By manually determining the ID and basing the ID on the TYPE column - you are creating the problem you seem to be trying to resolve.

    If you allow SQL Server to generate the ID then the process becomes thread-safe and you avoid any potential issues with another thread generating the same ID value.

    Therefore - the quickest way to generate a unique ID or select the existing ID is to NOT generate that value manually and to set TYPE as a unique constraint.

    If you change your code as I outlined previously then either approach will work correctly and perform efficiently.

        Use tempdb;
         Go
    
         If object_id('tempdb..#soi_test', 'U') Is Not Null
            Drop Table #soi_test;
         Go
    
     Create Table #soi_test (
            id int Not Null Identity(1, 1)
          , Type int Null
            ) On [PRIMARY]
         Go
    
      Alter Table #soi_test Add Constraint IX_soi_test_type Unique Nonclustered(Type) 
       With (statistics_norecompute = Off, ignore_dup_key = Off, allow_row_locks = On, allow_page_locks = On) On [PRIMARY]
         Go
    
    Declare @resultTable Table (id int);
    
     --==== seed the table to force duplicate check
    Declare @rnd int = abs(checksum(newid()) % 1000);
     Insert Into #soi_test Values (@rnd);
    
     --==== Attempt to insert
      Begin Try 
    
     Insert Into #soi_test
     Output inserted.ID Into @resultTable
     Values (@rnd);
     
     Select ID, 'Inserted' From @resultTable ot;
       
        End Try
      Begin Catch
       
     Select ID, 'Selected'
       From #soi_test st
      Where st.[Type] = @rnd;
      
      End Catch 
    

    Using this code - we can see that the ID is selected as expected.  If you comment out the insert to seed #soi_test - the row will be inserted.

    Put this in a loop and execute from multiple sessions...it will not attempt to insert a duplicate row.  If the value @rnd is generated from another window - the value will be selected because the insert will fail due to the TYPE columns unique constraint.

    Because you are generating the ID value and using that as TYPE - and the fact that your calculation of @rnd is NOT guaranteed to be random - your code will generate the same @rnd value from multiple sessions at the same time - and if that value has not been previously generated and inserted - will cause one of the inserts to fail.

    The other approach is this:

        Use tempdb;
         Go
    
         If object_id('tempdb..#soi_test', 'U') Is Not Null
            Drop Table #soi_test;
         Go
    
     Create Table #soi_test (
            id int Not Null Identity(1, 1)
          , Type int Null
            ) On [PRIMARY]
         Go
    
      Alter Table #soi_test Add Constraint IX_soi_test_type_2 Unique Nonclustered(Type) 
       With (statistics_norecompute = Off, ignore_dup_key = Off, allow_row_locks = On, allow_page_locks = On) On [PRIMARY]
         Go
    
    Declare @result int;
    
     --==== seed the table to force duplicate check
    Declare @rnd int = abs(checksum(newid()) % 1000);
     Insert Into #soi_test Values (@rnd);
    
     Select @result = st.id
       From #soi_test st
      Where st.[Type] = @rnd;
    
         If @result Is Null
      Begin 
    
     Insert Into #soi_test
     Values (@rnd);
     
     Select *
          , 'Inserted'
       From #soi_test st
      Where Type = @rnd;
    
        End
       Else  
      Begin 
    
     Select *
          , 'Selected'
       From #soi_test st
      Where st.[Type] = @rnd;
    
        End
    
    

    Either approach works and will work from multiple sessions without any issues.

    Now - as to which one is faster...probably the INSERT ... OUTPUT will be faster as it doesn't require an extra select statement to get the result but that can be tested to see how much it adds to the performance.


    Jeff Williams

    Wednesday, October 2, 2019 7:43 PM
  • Using 'SET IDENTITY_INSERT soi_test ON' allows you to insert the 'identity' value.  This is not enabling the IDENTITY...

    What you are doing is generating a value and inserting that value into the table - and that is where your problem lies...

    If you allowed SQL Server to generate the identity this process would be thread-safe and you would not have any issues.

    INSERT INTO soi_test (type) VALUES (@rnd);

    The identity insert is only used to check if the entered value is correct since in this case the ID is always equal to TYPE. We can reason with the ID automatically managed by the SQL server, but in this case you have to think of a unique index on TYPE since ID=TYPE means that TYPE must be unique.

    By manually determining the ID and basing the ID on the TYPE column - you are creating the problem you seem to be trying to resolve.

    If you allow SQL Server to generate the ID then the process becomes thread-safe and you avoid any potential issues with another thread generating the same ID value.

    Therefore - the quickest way to generate a unique ID or select the existing ID is to NOT generate that value manually and to set TYPE as a unique constraint.

    If you change your code as I outlined previously then either approach will work correctly and perform efficiently.

        Use tempdb;
         Go
    
         If object_id('tempdb..#soi_test', 'U') Is Not Null
            Drop Table #soi_test;
         Go
    
     Create Table #soi_test (
            id int Not Null Identity(1, 1)
          , Type int Null
            ) On [PRIMARY]
         Go
    
      Alter Table #soi_test Add Constraint IX_soi_test_type Unique Nonclustered(Type) 
       With (statistics_norecompute = Off, ignore_dup_key = Off, allow_row_locks = On, allow_page_locks = On) On [PRIMARY]
         Go
    
    Declare @resultTable Table (id int);
    
     --==== seed the table to force duplicate check
    Declare @rnd int = abs(checksum(newid()) % 1000);
     Insert Into #soi_test Values (@rnd);
    
     --==== Attempt to insert
      Begin Try 
    
     Insert Into #soi_test
     Output inserted.ID Into @resultTable
     Values (@rnd);
     
     Select ID, 'Inserted' From @resultTable ot;
       
        End Try
      Begin Catch
       
     Select ID, 'Selected'
       From #soi_test st
      Where st.[Type] = @rnd;
      
      End Catch 

    Using this code - we can see that the ID is selected as expected.  If you comment out the insert to seed #soi_test - the row will be inserted.

    Put this in a loop and execute from multiple sessions...it will not attempt to insert a duplicate row.  If the value @rnd is generated from another window - the value will be selected because the insert will fail due to the TYPE columns unique constraint.

    Because you are generating the ID value and using that as TYPE - and the fact that your calculation of @rnd is NOT guaranteed to be random - your code will generate the same @rnd value from multiple sessions at the same time - and if that value has not been previously generated and inserted - will cause one of the inserts to fail.

    The other approach is this:

        Use tempdb;
         Go
    
         If object_id('tempdb..#soi_test', 'U') Is Not Null
            Drop Table #soi_test;
         Go
    
     Create Table #soi_test (
            id int Not Null Identity(1, 1)
          , Type int Null
            ) On [PRIMARY]
         Go
    
      Alter Table #soi_test Add Constraint IX_soi_test_type_2 Unique Nonclustered(Type) 
       With (statistics_norecompute = Off, ignore_dup_key = Off, allow_row_locks = On, allow_page_locks = On) On [PRIMARY]
         Go
    
    Declare @result int;
    
     --==== seed the table to force duplicate check
    Declare @rnd int = abs(checksum(newid()) % 1000);
     Insert Into #soi_test Values (@rnd);
    
     Select @result = st.id
       From #soi_test st
      Where st.[Type] = @rnd;
    
         If @result Is Null
      Begin 
    
     Insert Into #soi_test
     Values (@rnd);
     
     Select *
          , 'Inserted'
       From #soi_test st
      Where Type = @rnd;
    
        End
       Else  
      Begin 
    
     Select *
          , 'Selected'
       From #soi_test st
      Where st.[Type] = @rnd;
    
        End

    Either approach works and will work from multiple sessions without any issues.

    Now - as to which one is faster...probably the INSERT ... OUTPUT will be faster as it doesn't require an extra select statement to get the result but that can be tested to see how much it adds to the performance.


    Jeff Williams

    You are focusing on my "test" code not on the request, actually I'm looking for the fastest way to do that operation. I have changed the soi_test table as you suggested (ID managed by SQL server + UNIQUE index on type) and repeated also my tests:

    1. the TRY ... CATCH approach you are proposing is on average 50 times slower than mine 

    2. the alternative approach without the "else" statement (that is only used to emit "selected") is on average 30 times slower than mine



    • Edited by dsmwb Wednesday, October 2, 2019 9:19 PM
    Wednesday, October 2, 2019 9:17 PM
  • What code are you running that you say is 50 times faster?  I am focusing on your test code because that is the only code you have provided.

    Your test code is flawed - you setup a table with an IDENTITY and then never allow the identity to be generated by SQL Server.

    In my examples I am including a select statement just so we can see which branch has been taken.  In the second example, you can modify the insert statement to use OUTPUT to retrieve the inserted ID value - the same as in the TRY/CATCH...

    The general approach is:

    1) Attempt to INSERT - if fails then SELECT value from table in catch block

    2) SELECT value from table - if not found, insert

    The first approach will process faster when the type value doesn't exist - the second approach will be fastest if the value already exists - since it only has to perform the select.


    Jeff Williams

    Wednesday, October 2, 2019 9:43 PM
  • What code are you running that you say is 50 times faster?

    The code is similar to that of the original post, adapted with the changes you proposed. Of course the speed depends on my device, but both tests (my code and your proposal) were performed under the same conditions.

    I am focusing on your test code because that is the only code you have provided.

    Actually, there is a specific request about the speed. You can propose a totally new code. I will test it.

    Your test code is flawed - you setup a table with an IDENTITY and then never allow the identity to be generated by SQL Server.

    The reasons for this choice have already been explained.

    In my examples I am including a select statement just so we can see which branch has been taken.  In the second example, you can modify the insert statement to use OUTPUT to retrieve the inserted ID value - the same as in the TRY/CATCH...

    The general approach is:

    1) Attempt to INSERT - if fails then SELECT value from table in catch block

    2) SELECT value from table - if not found, insert

    The first approach will process faster when the type value doesn't exist - the second approach will be fastest if the value already exists - since it only has to perform the select.


    Jeff Williams

    Thank you for your proposal, your approach is clear to me.

    Thursday, October 3, 2019 10:30 PM
  • Hi,

    Sry that I left the conversation after my first response.

    Off-topic: I noticed dsmwb that you responded to all responses which you got and thanks the people that tried to help you. Well done! This is pretty rare in the forum (unfortunately) 👍😃

    Note: will be nice to have your name and not to use "dsmwb" which seems randomly

    back to the topic...

    1. You cannot discuss performance if you do not provide Execution Plan (EP). The EP is the first step in order to confirm that all participate in the discussion actually speak about the same, since the same query might be executed totally differently in different servers, especially in this thread when we do not have any sample data to discuss.

    2. Please explain what is the status if the issue now?
    I understand that you have a working solution. So why do you need (don't point us to read the entire thread but just summarize please)

    If the issue is performance and you have a working solution then please provide sample data to start with (provide query to insert the data), a sample query which you want to test, and your working solution query (even if it is already somewhere up in the thread).

    Do not provide the test you do but the best query that you have for production solution (the original massage includes script that use loop to execute something multiple time for test. Let's focus on the query itself and let us do the test without any loop. Using loop inside a query for test is not a good idea and it does not present the real case, since all the loop is done under one transaction and not as separate multiple calls to the server. For tests you need to execute each query from external client as separate request to the server.

    There are applications for tests which supports for randomization of input parameters (and you can use random parameter in the query and execute separate queries and not all in a transaction loop), like this old and awesome open source:
    https://github.com/ErikEJ/SqlQueryStress


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]


    Friday, October 4, 2019 4:37 AM
    Moderator
  • Sry that I left the conversation after my first response.

    Every contribution is always welcome.

    Off-topic: I noticed that you responded to all responses which you got and thanks the people that tried to help you. Well done! This is pretty rare in the forum (unfortunately) 👍😃

    Thank you, many of us are here on a voluntary basis, so it seems right to me to thank them for their contribution.

    Note: will be nice to have your name and not to use "dsmwb" which seems randomly

    You're right, but this is a "shared" account among the members of my company, even though I mostly use it.

    back to the topic...

    1. You cannot discuss performance if you do not provide Execution Plan (EP). The EP is the first step in order to confirm that all participate in the discussion actually speak about the same, since the same query might be executed totally differently in different servers, especially in this thread when we do not have any sample data to discuss.

    You're absolutely right, but my purpose was to provide a basis for the discussion and then test all the proposals myself (the aim was mainly to simplify the work of others). I then realized that clearly this could have led to problems and discouraged others from facing the discussion.

    2. Please explain what is the status if the issue now?
    I understand that you have a working solution. So why do you need (don't point us to read the entire thread but just summarize please)

    If the issue is performance and you have a working solution then please provide sample data to start with (provide query to insert the data), a sample query which you want to test, and your working solution query (even if it is already somewhere up in the thread).

    I don't see this as an issue, I have no sample data, I'm just thinking about this thing on an abstract basis. I started thinking about the fastest way to select or insert a record in a table and once I determined which one I thought was the best way to do it, I wanted to compare my idea with others. Clearly the need arose from a real case that suggested part of the constraints, for example, the ratio between the number of repetitions of the loop (100000) and the number of different types (1000) was generated by an analysis of one of my production tables.

    Do not provide the test you do but the best query that you have for production solution (the original massage includes script that use loop to execute something multiple time for test. Let's focus on the query itself and let us do the test without any loop. Using loop inside a query for test is not a good idea and it does not present the real case, since all the loop is done under one transaction and not as separate multiple calls to the server. For tests you need to execute each query from external client as separate request to the server.

    There are applications for tests which supports for randomization of input parameters (and you can use random parameter in the query and execute separate queries and not all in a transaction loop), like this old and awesome open source:
    https://github.com/ErikEJ/SqlQueryStress

    Thank you very much for the info, honestly I was not aware of the fact that the loop was executed in a single transaction, this partially justifies the anomalies that I found in a test done in a real environment, for example I saw almost contemporary inserts, for this reason I started to use the TRY..CATCH. I will try to repeat the tests without using loops.


    • Edited by dsmwb Friday, October 4, 2019 3:34 PM
    Friday, October 4, 2019 2:35 PM
  • Hi dsmwb,

    Thank you very much for your reply.But I would suggest that you give us your actual data and your execution plan.There are many factors that affect your efficiency, and we can only give you some advice as often as possible.Without your actual code and execution plan, it's hard to give you helpful advice. 

    If you could not provide us more information because  of security ,maybe you can ask professional engineer for help, and they will deal with your problem separately and confidentially. https://support.microsoft.com/en-us/assistedsupportproducts

    Best Regards,

    Rachel 


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, October 7, 2019 6:25 AM
  • Hi dsmwb,

    Thank you very much for your reply.But I would suggest that you give us your actual data and your execution plan.There are many factors that affect your efficiency, and we can only give you some advice as often as possible.Without your actual code and execution plan, it's hard to give you helpful advice. 

    Thanks for your suggestion, I have no real data, as I wrote before I'm just thinking about this in abstract. Everything starts from the definitions in the OP (I mean the table and the testing code).

    Monday, October 7, 2019 8:06 PM
  • I think all the above suggestions can be tried. I'm sorry that I have no new ideas at present. Maybe the comparison of several ways can help you find a better way. As you said, you want to discuss this test.

     

    We don't have your actual data to test, and we don't know exactly what your requirements are, all we can do is guess what you want and give you some Suggestions.

     

    By the way, if you have get helpful replies or solve your issue ,in order to close this thread, please kindly mark helpful replies or your own reply as answers. By doing so, it will benefit all community members who are having this similar issue.  Your contribution is highly appreciated.

    Thank you in advance for your understanding.

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, October 8, 2019 7:22 AM
  • Hi,

    By real data we do not mean your production data but some sample data so we can play with and discuss results

    As I mentioned above without Execution Plan which you get when you execute the queries on a sample data, we (at least I) cannot discuss performance since performance is subject to multiple parameters which we cannot cover in the scope of the forum. Therefore the execution plan gives a specific scenario according to specific set of parameters and the way the server "chose" to execute the query.


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    Tuesday, October 8, 2019 7:24 AM
    Moderator
  • Hi,

    By real data we do not mean your production data but some sample data so we can play with and discuss results

    As I mentioned above without Execution Plan which you get when you execute the queries on a sample data, we (at least I) cannot discuss performance since performance is subject to multiple parameters which we cannot cover in the scope of the forum. Therefore the execution plan gives a specific scenario according to specific set of parameters and the way the server "chose" to execute the query.


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    Thanks for the detail of the request, I have no example data. The data is created randomly during the test from this line of code:

    DECLARE @rnd INT = ABS(CHECKSUM(NEWID()) % 1000);

    Tuesday, October 8, 2019 10:21 AM
  • I think all the above suggestions can be tried. I'm sorry that I have no new ideas at present. Maybe the comparison of several ways can help you find a better way. As you said, you want to discuss this test.

     

    We don't have your actual data to test, and we don't know exactly what your requirements are, all we can do is guess what you want and give you some Suggestions.

     

    By the way, if you have get helpful replies or solve your issue ,in order to close this thread, please kindly mark helpful replies or your own reply as answers. By doing so, it will benefit all community members who are having this similar issue.  Your contribution is highly appreciated.

    Thank you in advance for your understanding.

    Best Regards,

    Rachel


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    thanks, I've already tried all the suggested methods. The methods suggested so far are:

    1. mine, scheme: select + TRY insert CATCH + select -> reference

    2. modified version from Tony's blog https://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/07/03/855.aspx -> 10% faster than mine, but the modified version introduces excessive complexity

    3. scheme: insert + TRY select catch -> 150% slower than mine

    4. scheme: TRY insert + select CATCH + select, on a modified version of the table ID managed by SQL server + unique index on type -> much slower than the other proposals

    5. scheme: insert + select + (insert + select) , on a modified version of the table ID managed by SQL server + unique index on type -> much slower than the other proposals


    • Edited by dsmwb Tuesday, October 8, 2019 10:35 AM
    Tuesday, October 8, 2019 10:34 AM
  • Thank you for your positive reply.You can compare the following methods and find one of them as a better solution.

    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, October 9, 2019 6:35 AM
  • Hi,

    please forget about your test for a minute and think about production only

    try to emulate the real scenario.

    in your real production you have some data and you want to do insert. right?
    So... provide sample data which emulate the data in the production at specific point in time. Provide the action which you want to do next (insert specific row) and explain what you expect to get

    Give several sample to cover all scenario and we will try to find the best way to do the task.

    If you want to speak about performance then we need the exact DDL and the execution plan which you get when you execute each solution.

    * but it seems like you already got multiple solutions, and not clear to me what you need next and why this discussion is not closed :-)


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    Wednesday, October 9, 2019 1:03 PM
    Moderator
  • Hi,

    please forget about your test for a minute and think about production only

    try to emulate the real scenario.

    in your real production you have some data and you want to do insert. right?
    So... provide sample data which emulate the data in the production at specific point in time. Provide the action which you want to do next (insert specific row) and explain what you expect to get

    Give several sample to cover all scenario and we will try to find the best way to do the task.

    If you want to speak about performance then we need the exact DDL and the execution plan which you get when you execute each solution.

    I'm sorry, obviously I couldn't explain myself, I'm thinking in a completely abstract way, I don't have any kind of production data to supply. Everything starts from the OP, so the DDL is the one provided in OP.

    * but it seems like you already got multiple solutions, and not clear to me what you need next and why this discussion is not closed :-)


    In fact, but I was hoping to have some other interesting tips.

    Wednesday, October 9, 2019 1:38 PM
  • You can start with what you have, and once you get to the next step, then you will be able to provide more information in the forums and sample working scenario. At that time we will be able to open the discussion again

    I think that you can close this thread at this time


    signature   Ronen Ariely
     [Personal Site]    [Blog]    [Facebook]    [Linkedin]

    Wednesday, October 9, 2019 2:43 PM
    Moderator