locked
Comparing Data Differences Between Two Identical Access Tables and only display the differences RRS feed

  • Question

  • I have two identical Access tables, for example testDataset and devDataset. I have around a million rows and around 20 columns to review in both the tables. I also have a column for key that will be same in both the tables which I will be using for comparison. The column headers in both the tables are the same.

    I want to compare and display only the differences between testDataset and devDataset in a new resultTable which will also have the same column headers as testDataset or devDataset.

    If the data is the same for any column, I want the resultTable to show a null value. And if the data is different for any column, I want the value from testDataset to be present in the resultTable. I have tried a lot of queries but was unsuccessful.

    Is this possible?  Any help is appreciated.  Thank you in advance!


    Monday, March 9, 2020 3:51 AM

All replies

  • ciao Rahul,

    you can achieve your task applying wizard query which usually the result is realized by sql predicate in left join.

    Another solution is based on exists/not exists predicate applying subQuery.

    Anyway my concerns is about the number or rows....it seems your tables is made by million of rows.

    Ace and jet Access's database are not so suit to manage such kind of range of rows in my opinion, so you should think to move your database on a more robust database.

    Queries on million of row in Access takes time to be run.

    To keep on Ms' family, MS-SQL server should be a proper solution.

    HTH.

    Ciao, Sandro.

    Monday, March 9, 2020 9:59 AM
  • Rahul

    My first question is why do you have two almost identical tables? Data should be stored once only.

    As Sandro stated you can use the unmatched query wizard to at least partly analyse differences though you will then need to further modify the query in design view. You may find it useful to read the detailed article on my website which covers the various different ways of achieving the task of Synchronising Data. That may also help you achieve your specific requirements for display purposes.

    However, whilst I agree that SQL Server is more robust than Access, I disagree with Sandro's comments regarding Access being able to handle tables with around a million rows. I have several Access apps including tables of 1-3 million records and they work fine.  Also, providing fields commonly used in searches or filters are indexed, queries based on those tables will be fast. Without indexing, query performance will indeed be slow... BUT that would also be true for SQL Server unless steps are taken to fully optimise queries (views) at the server level.

    For more info on making Access queries run as fast as possible, see my article on Optimising Queries


    • Edited by isladogs52 Monday, March 9, 2020 1:03 PM
    Monday, March 9, 2020 1:01 PM
  • Hi Sandro,

    Thank you so much for replying.

    I have dealt with tables with more than a million rows before in access. I tried using unmatched query wizard but the output table is completely empty. I tried writing my own queries using SQL view in access. I just couldn't get the result I wanted. My Output table should contain only the data that is different in both the tables as I mentioned before.


    Thursday, March 12, 2020 11:34 PM
  • Hi Mendip Data Systems,

    Thank you for replying.

    I have two similar tables but some data is not the same in these tables. I want to show this unmatched data in the output table. 

    I tried using unmatched query wizard but the output table is completely empty. I tried writing my own queries using SQL view in access. I just couldn't get the result I wanted. My Output table should contain only the data that is different in both the tables as I mentioned before.


    Thursday, March 12, 2020 11:37 PM
  • ciao Rahul,

    taking for instance those following tables to be compared :

    I got this:

    applying the following sql predicate :

    SELECT T_person.id_person, T_person.person, T_person2.id_person
    FROM T_person LEFT JOIN T_person2 ON T_person.[id_person] = T_person2.[id_person]
    WHERE (((T_person2.id_person) Is Null));

    and this :

    SELECT T.id_person, T.person
    FROM T_person AS T
    WHERE not exists ( select id_person from T_person2 as T2 where t2.id_person=T.id_person);

    and this one :

    SELECT T.id_person, T.person, t2.id_person AS showNull
    FROM T_person AS T LEFT JOIN (select * from T_person2 as T2)  AS T3 ON t3.id_person=t.id_person
    WHERE t2.id_person is null;

    try modifying those above predicate into your scenario, you should easily achieve your task.

    ciao, Sandro.

    Friday, March 13, 2020 7:57 AM
  • @isladogs52
    > My first question is why do you have two almost identical tables? Data should be stored once only.

    Because Rahul said the tables were from two different databases: Dev and Test.

    Friday, March 27, 2020 8:41 AM
  • You are going to have a very difficult time doing this in Access. I'm fairly certain, without actually taking the time to test, that Sandro's method will only show differences when the matching keys are different.

    @Sandro: In your example, try changing the record in T_person2 from "mario" to "Rahul", but leave mario as entry for id_person = 2 in the first table, T_person. You should expect to see this difference, but I'm fairly certain you will not, because only the primary key is matched.

    @Rahul: You have a couple of alternatives:

    1) Purchase a copy of Total Access Detective from FMS, Inc.
    http://fmsinc.com/MicrosoftAccess/DatabaseCompare.html

    Okay, it's not cheap at $ 299 USD for a single license, but if you do a lot of this type of work, it is well-worth the investment. This product finds differences in schema and/or data. It does require a matching primary key or unique index in each table to find differences in data, but you have that.

    2a) Download and install (if you haven't already done so) a free copy of SQL Server Developer edition. Get your data into two tables in a SQL Server database--there are a variety of methods, including using the free SSMA (SQL Server Migration Assistant) for Access, available for download from Microsoft. Then run a query with the except operator. I don't remember at the moment, but I think you have to use all field names in the SELECT statement, instead of just SELECT *.

    2b) Ditto choice 2a, but use the free version of Oracle instead, and the MINUS keyword.

    One of my favorite "go-to" sites is TechOnTheNet.com for clearly written answers to SQL questions.

    SQL: MINUS Operator
    https://www.techonthenet.com/sql/minus.php

    Friday, March 27, 2020 9:02 AM
  • True but he also made it clear he wants the Result table to always use the vales from the testDataset. Therefore he may as well just use that in both versions of the database, either by linking that table in the development database or importing it … whichever is most appropriate.

    Friday, March 27, 2020 9:07 AM
  • I think Sandro is on the right track.  If you think there are some rows in test but not in dev, or in dev but not in test you'll need to check for that first using left/right joins, e.g.

    select * from test 
    left join dev 
    on test.pk = dev.pk
    where dev.pk is null

    select * from test 
    right join dev 
    on test.pk = dev.pk
    where test.pk is null

    (where pk = primary key)

    The hard part is comparing values from each column where the rows match on primary key and substituting value from test if there is a difference. I think you could do this in SQL with an inner join as follows:

    select iif(nz(test.col1)<>nz(dev.col1),test.col1,Null) as col1,
    iif(nz(test.col2)<>nz(dev.col2),test.col2,Null) as col2,
    iif(nz(test.col3)<>nz(dev.col3),test.col3,Null) as col3,
    ...
    iif(nz(test.col20)<>nz(dev.col20),test.col20,Null) as col20
    from test
    inner join dev
    on test.pk = dev.pk

    (where col1..col20 are the columns in your source tables)

    BUT this would result in rows with Nulls all the way across where there are no differences between corresponding rows in the two tables.  Maybe this is ok. But you could also limit this by adding a where clause:

    where nz(test.col1)<>nz(dev.col1) or nz(test.col2)<>nz(dev.col2) or ...nz(test.col20)<>nz(dev.col20)

    I suspect this is going to be very slow but I believe it is theoretically doable. If this is something you need to do on a regular basis I would suggest using VBA to loop through your rows and processing that way.

    -Bruce

    Monday, April 6, 2020 8:34 PM
  • so I've done this a few times - - but 20 fields is potentially a challenge to do in one pass and you may need to split up....also this won't work with memo/long text field type...

    create a calculated field that is a string of the existing fields:

    CalcField: F1 & F2 & F3 &.... 

    you can't exceed 255 so you'll need to do this multiple times in parallel it all depends on the data - but a lot of date, number, text fields often only have 7 or 8 characters - - so it will depend on knowing your data

    then when you have these long string fields in both tables - do your testing....

    you can do testing via joins or you can try testing the string length counts ...the latter is flawed in that the data can differ but the string count can be the same - - so you have to know your data and do that for the memo/long text fields only...

    but some combination of this should be able to be a table:table checker for 20 fields....it's kind of a kludge but doesn't take long to set up ….and if in the end you run 2 to 4 queries to test the tables - it's repeatable easily enough...


    Wednesday, April 8, 2020 5:24 PM
  • In a similar case i have used a rather quick and dirty solution

    I have made a simple function that concatenated the field values on each row and compared the resulting field

    In order to make it faster and far more useful i created an extra field where i did the above but stored the hash of the result.

    So instead of comparing big strings i am comparing numbers that resulted in dramatic performance increase...

    Monday, April 27, 2020 8:30 AM