locked
Dynamic query that will compare 2 similar tables and output a report of the differences. RRS feed

  • Question

  • Looking for a generic query/sproc that I can give 2 tables with the same columns and it will compare the two of them and then produce a "report" type query that outputs a list of the Pky's and status of something like "Exists  in Table 1 only", "Exists  in Table 2 nly" or "Exists in both" (or does not even output those).  Then would also like a listing of columns with data values that are different if possible.  It seems like this should be possible  and someone has probably came up with this before but I could not find it if so.

    fyi- Using SQL 2012-2014 Std edition in the various environments where I want to use this.

    Monday, July 11, 2016 12:33 PM

Answers

All replies

  • First of all let me say that what are you looking for already exists in a professional tool and it's the redgate sql compare.

    If you like to do handmade stuffs, you can loop in the INFORMATION_SCHEMA.COLUMNS to list columns and compare the 2 tables, and so on...

    • Proposed as answer by Naomi N Monday, July 11, 2016 5:05 PM
    Monday, July 11, 2016 12:41 PM
  •  SELECT t1.*, t2.*
       FROM t1
            FULL OUTER JOIN
            t2
            ON t1.c1 = t2.c1
               AND t1.c2 = t2.c2
                ...
               AND t1.cn = t2.cn
     WHERE t1.key IS NULL 
        OR t2.key IS NULL; 

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Monday, July 11, 2016 1:08 PM
    Answerer
  • Then would also like a listing of columns with data values that are different if possible. 

    For Exists/Not Exists/Exists in both, you will be able to identify using the query template provided by Uri Dimant.

    For the other requirement, we can use Binary Checksum to compare each rows in both the table. There are other ways using concate/stuff/hashbyte etc. Checksum or Binary Checksum will be the simplest. Each column will not be identified uniquely but each row will have different values and can compare both the table rows using join. Whenever there is checksum difference it means that there is a difference in one or more columns. It is case sensitive too.

    select *,BINARY_CHECKSUM(*) ChkSum_Source from #temp1
    select *,BINARY_CHECKSUM(*) ChkSum_Destination from #temp2


    Regards, RSingh


    Monday, July 11, 2016 1:53 PM
  • Thanks this is part of what I am looking for.
    Monday, July 11, 2016 4:48 PM
  • Thanks.  I have had some mixed results try to use checksum in the past.  
    Monday, July 11, 2016 4:49 PM
  • with diff1 as (

    select * from table1

    except

    select * from table2

    )

    ,

    diff2 as (

    select * from table2

    except

    select * from table1

    )

    select * from diff1

    union

    select * from diff2

    • Proposed as answer by Johnny Grun Wednesday, July 13, 2016 5:04 AM
    Monday, July 11, 2016 6:48 PM
  • For the other requirement, we can use Binary Checksum to compare each rows in both the table. There are other ways using concate/stuff/hashbyte etc. Checksum or Binary Checksum will be the simplest.

    No, this is not very reliable. Checksum and binary_checksum use a fairly simplistic algorithm and they are 32-bit. Thus, they can easily miss a difference.
    Also, it seems that this is not what Jason was looking for - it seems that he is only interested in whether the key values are present in both tables, not whether data is different if both keys are present.

    Monday, July 11, 2016 9:38 PM
  • I worked up this.  It is like Johnny Grun's solution except that it also flags which data values are different.

    The eRC and eCR CTEs match what is in Johnny Grun's solution.

    I add the eFlag CTE that does an inner join between the first two CTEs and compares each non key column (RecordId is the primary key).  The ORDER BY puts the rows for a particular primary key value together.  For each column, if the values do not match, the "=" row has NULL, if they match, it is the existing value.  This way you can readily scan for the data that does not match. You would need to use system tables (like the INFORMATION_SCHEMA.COLUMNS that Andrea Caldarone proposes) to dynamically build up the column lists for eFlag.


    with [eRave] as ( select * from tempdb.dbo.[8A3C3181-1B02-423D-A587-9B6AA0EF3AD8] ) , [eCOG] as ( select * from tempdb.dbo.[D1E2A879-77DF-4F53-99FB-3C87BA850117] ) , [eRC] as (   select  RecordId, userid, project, siteid, Site, SaveTS, LAB_SPEC_COL_ST_IND_CA, LAB_SPEC_COL_ST_IND_PHO, TST_LOW_REF_RNG_NM_MG_RAW
    from eRave except select RecordId, userid, project, siteid, Site, SaveTS, LAB_SPEC_COL_ST_IND_CA, LAB_SPEC_COL_ST_IND_PHO, TST_LOW_REF_RNG_NM_MG_RAW from eCOG ), [eCR] as ( select RecordId, userid, project, siteid, Site, SaveTS, LAB_SPEC_COL_ST_IND_CA, LAB_SPEC_COL_ST_IND_PHO, TST_LOW_REF_RNG_NM_MG_RAW from eCOG except select RecordId, userid, project, siteid, Site, SaveTS, LAB_SPEC_COL_ST_IND_CA, LAB_SPEC_COL_ST_IND_PHO, TST_LOW_REF_RNG_NM_MG_RAW from eRave ) , [eFlag] as ( select [RecordId] = ISNULL(r.RecordId, c.RecordId) , [userid] = case r.userid when c.userid then r.userid else NULL end , [project] = case r.project when c.project then r.project else NULL end , [siteid] = case r.siteid when c.siteid then r.siteid else NULL end , [Site] = case r.Site when c.Site then r.Site else NULL end , [SaveTS] = case r.SaveTS when c.SaveTS then r.SaveTS else NULL end , [LAB_SPEC_COL_ST_IND_CA] = case r.LAB_SPEC_COL_ST_IND_CA when c.LAB_SPEC_COL_ST_IND_CA then r.LAB_SPEC_COL_ST_IND_CA else NULL end , [LAB_SPEC_COL_ST_IND_PHO] = case r.LAB_SPEC_COL_ST_IND_PHO when c.LAB_SPEC_COL_ST_IND_PHO then r.LAB_SPEC_COL_ST_IND_PHO else NULL end , [TST_LOW_REF_RNG_NM_MG_RAW] = case r.TST_LOW_REF_RNG_NM_MG_RAW when c.TST_LOW_REF_RNG_NM_MG_RAW then r.TST_LOW_REF_RNG_NM_MG_RAW else NULL end from eRC r full outer join eCR c on r.RecordID = c.RecordID ) select 'R' XX, * from eRC union select 'C' XX, * from eCR union select '=' XX, * from eFlag order by RecordId, XX desc ;



    Dan Jameson
    Associate Director of IT/DBA
    Children's Oncology Group
    www.ChildrensOncologyGroup.org



    • Edited by JediSQL Monday, July 11, 2016 11:18 PM typo
    Monday, July 11, 2016 11:15 PM
  • Thanks Dan this is getting closer to what I am after!  Would love to build all the queries dynamically from Info schema where all I have to do is pass in tbl names.  Pass one would compare the Pky fields for the Exists/Not Exists, then perform something similar to your eFlag CTE routine to identify data diffs across ALL columns as you are doing.
    Tuesday, July 12, 2016 5:31 PM
  • Thx for the reply Erland.  In full agreement on checksums.  As for the data compare, my thoughts were 1st compare Pky fields for EXISTS/NOT EXISTS determination, then compare all field to identify data diffs within the rows themselves.
    Tuesday, July 12, 2016 5:34 PM