none
Violation of UNIQUE KEY constraint

    Question

  • I have an INSERT that moves values from a staging table (i.e. a temporary table) to the real table. Since the temporary table may have dups, I'm using a left-join (plus a DISTINCT clause) to insure that the query doesn't insert dups into the real table. What's wrong with my query? Why is it still producing dups?

    INSERT INTO dbo.Pages (OcrText, FileId, PageNo)
    Select DISTINCT PS.OcrText, PS.FileID, Ps.PageNo
    FRom dbo.Pages_Staging as PS
    Left Join Pages as P On P.PageNo = PS.PageNo  AND P.FileID = PS.FileID and P.OCRText = PS.OcrText
    WHERE P.FileID IS NULL

     Violation of UNIQUE KEY constraint 'UQ_FILEID_PAGENO'. Cannot insert duplicate key in object 'dbo.PAGES'. The duplicate key value is (1037, 113).
     
    The unique key is the two integer columns FileID-PageNO. I don't see how my query violates it.
    Friday, August 01, 2014 8:10 PM

Answers

  • Copy each value and paste in excel. Compare it in over there, you may spot the difference there.

    If my previous query did not work then try this:

    INSERT INTO dbo.Pages (OcrText, FileId, PageNo)
     Select MIN(PS.OcrText) OcrText, PS.FileID, Ps.PageNo
     FRom dbo.Pages_Staging as PS
     Left Join Pages as P On P.PageNo = PS.PageNo  AND P.FileID = PS.FileID and P.OCRText = PS.OcrText
     WHERE P.FileID IS NULL
    GROUP BY PS.FileID, Ps.PageNo


    If this post answers your query, please click "Mark As Answer" or "Vote as Helpful".


    • Edited by Itz Shailesh Friday, August 01, 2014 8:31 PM
    • Marked as answer by jal2 Friday, August 01, 2014 9:03 PM
    Friday, August 01, 2014 8:30 PM
  • Copy each value and paste in excel. Compare it in over there, you may spot the difference there.


    Thanks for the help.  I mouse-copied the two values into SSMS and then  wrote a script to compare them.

    Declare @v1 varchar(max) = '111111110.111.11 11111 111.111 111111111 Ilills .11 111 coorg oficsrsnprs PA 13 39 unoA LOZ 033013 snivis WIVIO 1.1411003 ti IN LINcl NO CHECK ENCLOSED 496 1.1 402 wLZ IMOTI MI .1 turfr sa 41 11 IV AC OciSS 2000 dIZ 021 031 IVVJ OZ Nnr ssLt szfr 000 100 131121103 33111 11113 00 sn 31 VAILid KW VN ssamsna 131.1 91.9 90 La INN sanouanonelv BAILICI 01 103 NVS 1.091. SAS eIVOH VA 001 crAN IVddV SNVE 1313 AO OA 3131 EVNS VddVI NArc 100 AV HOVIe SAS .190.1 SVN 301 10 ICILIAB vlenonauonas NNI aL 09 9.19 1.131 ansmass NV WK diLIAV 13 ns 00 31111 11133 301121131 001 000 rfzs tLss rnN ZO JVVI 130 120 ZId 0002 SSicO CA VI 11 14 as rfrut 1. IM ITOMI ZLw 204 1.1 694 DESOLCNE KCEHC ON lcNIL NI it 3001141.1 OIVIW sivins 310330 ZOL Aonu 93 31 AP srpnsrscifo grooc 111 11. sllilI 111111111 111.111 11111 11.111.011111111'
    Declare @v2 varchar(max) = '111111110.111.11 11111 111.111 111111111 Ilills .11 111 coorg oficsrsnprs PA 13 39 unoA LOZ 033013 snivis WIVIO 1.1411003 ti IN LINcl NO CHECK ENCLOSED 496 1.1 402 wLZ IMOTI MI .1 turfr sa 41 11 IV AC OciSS 2000 dIZ 021 031 IVVJ OZ Nnr ssLt szfr 000 100 131121103 33111 11113 00 sn 31 VAILid KW VN ssamsna 131.1 91.9 90 La INN sanouanonelv BAILICI 01 103 NYS 1.091. SAS eIVOH VA 001 crAN IVddV SNVE 1313 AO OA 3131 EVNS VddVI NArc 100 AV HOVIe SAS .190.1 SYN 301 10 ICILIAB vlenonauonas NNI aL 09 9.19 1.131 ansmass NV WK diLIAV 13 ns 00 31111 11133 301121131 001 000 rfzs tLss rnN ZO JVVI 130 120 ZId 0002 SSicO CA VI 11 14 as rfrut 1. IM ITOMI ZLw 204 1.1 694 DESOLCNE KCEHC ON lcNIL NI it 3001141.1 OIVIW sivins 310330 ZOL Aonu 93 31 AP srpnsrscifo grooc 111 11. sllilI 111111111 111.111 11111 11.111.011111111'
    Declare @pos int = 1
    While 1 = 1
    Begin
    	Declare @char1 char = substring(@v1,@pos, @pos+1);
    	Declare @char2 char =  substring (@v2, @pos, @pos+1) 
    	if @char1 <> @Char2
    	Begin 
    		print cast(@pos as varchar(5)) + '   ''' + @char1 + '''   ' + '   ''' + @char2 + '''   ' 
    	End
    	set @pos = @pos +1
    	if @pos > len(@v1) break;
    End
    The output is showing me two chars different but it's hard to make much sense of this.

    357   'V'      'Y'   
    457   'V'      'Y'   
    I can only surmise that the OCR engine processed the same page twice and the 2nd time around it twice confused the letter 'V' with 'Y' - but what's confusing is that I have logic to prevent the OCR engine from processing the same page twice.

    But at least I've narrowed down the problem. Thanks!


    • Edited by jal2 Friday, August 01, 2014 8:38 PM
    • Marked as answer by jal2 Friday, August 01, 2014 8:42 PM
    Friday, August 01, 2014 8:37 PM

All replies

  • Must be that the OcrText value isn't distinct. If I remove that field from the join it works.
    Friday, August 01, 2014 8:21 PM
  • Try This:

    INSERT INTO dbo.Pages (OcrText, FileId, PageNo)
    Select PS.OcrText, PS.FileID, Ps.PageNo
    FRom dbo.Pages_Staging as PS
    Left Join dbo.Pages as P On P.PageNo = PS.PageNo  AND 
    								P.FileID = PS.FileID and 
    									P.OCRText = PS.OcrText
    WHERE P.FileID IS NULL and 
    			P.PageNo IS NULL


    If this post answers your query, please click "Mark As Answer" or "Vote as Helpful".

    Friday, August 01, 2014 8:22 PM
  • This query

    Select Distinct ocrText from pages_staging where fileid =1037 and pageNo = 113

    returns two values proving I have two different values for ocrText  with the combo 1037-113. That seems impossible. Even looking at the values onscreen they look identical. For one thing the length is the same:

    Select Distinct Len(ocrText) from pages_staging where fileid =1037 and pageNo = 113

    Returns only one value.

    Friday, August 01, 2014 8:25 PM
  • Copy each value and paste in excel. Compare it in over there, you may spot the difference there.

    If my previous query did not work then try this:

    INSERT INTO dbo.Pages (OcrText, FileId, PageNo)
     Select MIN(PS.OcrText) OcrText, PS.FileID, Ps.PageNo
     FRom dbo.Pages_Staging as PS
     Left Join Pages as P On P.PageNo = PS.PageNo  AND P.FileID = PS.FileID and P.OCRText = PS.OcrText
     WHERE P.FileID IS NULL
    GROUP BY PS.FileID, Ps.PageNo


    If this post answers your query, please click "Mark As Answer" or "Vote as Helpful".


    • Edited by Itz Shailesh Friday, August 01, 2014 8:31 PM
    • Marked as answer by jal2 Friday, August 01, 2014 9:03 PM
    Friday, August 01, 2014 8:30 PM
  • Copy each value and paste in excel. Compare it in over there, you may spot the difference there.


    Thanks for the help.  I mouse-copied the two values into SSMS and then  wrote a script to compare them.

    Declare @v1 varchar(max) = '111111110.111.11 11111 111.111 111111111 Ilills .11 111 coorg oficsrsnprs PA 13 39 unoA LOZ 033013 snivis WIVIO 1.1411003 ti IN LINcl NO CHECK ENCLOSED 496 1.1 402 wLZ IMOTI MI .1 turfr sa 41 11 IV AC OciSS 2000 dIZ 021 031 IVVJ OZ Nnr ssLt szfr 000 100 131121103 33111 11113 00 sn 31 VAILid KW VN ssamsna 131.1 91.9 90 La INN sanouanonelv BAILICI 01 103 NVS 1.091. SAS eIVOH VA 001 crAN IVddV SNVE 1313 AO OA 3131 EVNS VddVI NArc 100 AV HOVIe SAS .190.1 SVN 301 10 ICILIAB vlenonauonas NNI aL 09 9.19 1.131 ansmass NV WK diLIAV 13 ns 00 31111 11133 301121131 001 000 rfzs tLss rnN ZO JVVI 130 120 ZId 0002 SSicO CA VI 11 14 as rfrut 1. IM ITOMI ZLw 204 1.1 694 DESOLCNE KCEHC ON lcNIL NI it 3001141.1 OIVIW sivins 310330 ZOL Aonu 93 31 AP srpnsrscifo grooc 111 11. sllilI 111111111 111.111 11111 11.111.011111111'
    Declare @v2 varchar(max) = '111111110.111.11 11111 111.111 111111111 Ilills .11 111 coorg oficsrsnprs PA 13 39 unoA LOZ 033013 snivis WIVIO 1.1411003 ti IN LINcl NO CHECK ENCLOSED 496 1.1 402 wLZ IMOTI MI .1 turfr sa 41 11 IV AC OciSS 2000 dIZ 021 031 IVVJ OZ Nnr ssLt szfr 000 100 131121103 33111 11113 00 sn 31 VAILid KW VN ssamsna 131.1 91.9 90 La INN sanouanonelv BAILICI 01 103 NYS 1.091. SAS eIVOH VA 001 crAN IVddV SNVE 1313 AO OA 3131 EVNS VddVI NArc 100 AV HOVIe SAS .190.1 SYN 301 10 ICILIAB vlenonauonas NNI aL 09 9.19 1.131 ansmass NV WK diLIAV 13 ns 00 31111 11133 301121131 001 000 rfzs tLss rnN ZO JVVI 130 120 ZId 0002 SSicO CA VI 11 14 as rfrut 1. IM ITOMI ZLw 204 1.1 694 DESOLCNE KCEHC ON lcNIL NI it 3001141.1 OIVIW sivins 310330 ZOL Aonu 93 31 AP srpnsrscifo grooc 111 11. sllilI 111111111 111.111 11111 11.111.011111111'
    Declare @pos int = 1
    While 1 = 1
    Begin
    	Declare @char1 char = substring(@v1,@pos, @pos+1);
    	Declare @char2 char =  substring (@v2, @pos, @pos+1) 
    	if @char1 <> @Char2
    	Begin 
    		print cast(@pos as varchar(5)) + '   ''' + @char1 + '''   ' + '   ''' + @char2 + '''   ' 
    	End
    	set @pos = @pos +1
    	if @pos > len(@v1) break;
    End
    The output is showing me two chars different but it's hard to make much sense of this.

    357   'V'      'Y'   
    457   'V'      'Y'   
    I can only surmise that the OCR engine processed the same page twice and the 2nd time around it twice confused the letter 'V' with 'Y' - but what's confusing is that I have logic to prevent the OCR engine from processing the same page twice.

    But at least I've narrowed down the problem. Thanks!


    • Edited by jal2 Friday, August 01, 2014 8:38 PM
    • Marked as answer by jal2 Friday, August 01, 2014 8:42 PM
    Friday, August 01, 2014 8:37 PM
  • Now we have to guess at the DDL!  We have no idea what the Pages staging table has for a key or a UNIQUE constraint. Please read the Netiquette at the start of this forum. 


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Friday, August 01, 2014 8:41 PM
  • Copy each value and paste in excel. Compare it in over there, you may spot the difference there.

    If my previous query did not work then try this:

    INSERT INTO dbo.Pages (OcrText, FileId, PageNo)
     Select MIN(PS.OcrText) OcrText, PS.FileID, Ps.PageNo
     FRom dbo.Pages_Staging as PS
     Left Join Pages as P On P.PageNo = PS.PageNo  AND P.FileID = PS.FileID and P.OCRText = PS.OcrText
     WHERE P.FileID IS NULL
    GROUP BY PS.FileID, Ps.PageNo


    If this post answers your query, please click "Mark As Answer" or "Vote as Helpful".


    Your code was more helpful than I realized - I guess I can't add OcrText to the unique constraint in virtue of being nvarchar(max).
    Friday, August 01, 2014 9:04 PM