none
Comparing excel worksheets for unique entries RRS feed

  • Question

  • I have a task to compare a number of worksheets in a 'source' directory (identical column headers) with number of worksheets in a 'compare' directory (identical column headers to the source data). Any unique entries in the 'source' directory need to be collated into a new worksheet (same format as the others). 

    eg.  the source files V1.0 have had rows stripped out and a second file produced ie V2.0.  I need to extract the stripped out rows and put them into a new worksheet.  This needs to be done 62 times and then the 62 resulting new worksheets merged into one (the merging bit I can already do with vba existing script).  Can anyone help, i guess vba script is the best way forward here?

    Wednesday, September 27, 2017 3:28 PM

All replies

  • Hi skiguy777,

    I think we have to iterate through rows to compare if it has been changed.

    However, I'm wondering which row should be compared. Should we compare Row1 in source sheet with Row1 in target sheet,Row2 in source sheet with Row2 in target sheet..?

    If so, which row in target row should we compare with if previous row in target sheet has been deleted?

    For example.

    Source Sheet:

    Name   Num

    N1          1

    N2          2

    N3          3

    Targe Sheet:

    Name   Num

    N2          2

    N3          3

    As I think, the only change in source sheet should be Row2(N1 1) which has been deleted. However, if we compare row to row, we could know Row2 to Row4 are all changed.

    So could you tell us how would compare the rows?

    Best Regards,

    Terry


    MSDN Community Support Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, September 29, 2017 10:02 AM
  • Hi Terry,  thanks for your reply.  comparing row to row would, as you stated highlight Row2 to Row4 also which is no good to me.  I need to compare row1 with each row (ie row1 to rowx) and then if no match collate into a new spreadsheet.  Same with row2....row3....row4.....rowx

    the new spreadsheet would then show any rows that have been deleted from version 1.0 to version 2.0.  I hope this makes sense?

    Wednesday, October 4, 2017 10:56 AM
  • Hi skiguy777

    Comparing row by row (even if you were to check every row) it will be very resource consuming and slow.

    What I mean is take the first row value and compare to all other rows in multiple worksheets and if a match is found then you know a this is a duplicate record... so don't copy to the new worksheet.  if no match found, then copy to the new worksheet.  However, this is the right way to do as it will exponentially increase the time to compare if the number of records increase.  This is what you described to Terry in the post below... but this will be not efficient.

    Better way is to consolidate the multiple source worksheets into a single worksheet (even with duplicates) first.  Then get the unique records into a ADO record set using SELECT DISTINCT sql.  This will only put the unique records into the record set.  Then copy the record set to a new workbook.   For this you need to know how to consolidate first (you seem to have the script for this). Then you need to know how to get the excel rows into a ADO record set.    

    You can use the below function to get Excel data into a record set.  I did not write this.. just got this from google search.  You have to change few things in this code.

    1) Change the ThisWorkbook.FullName to your source workbook's full path.

    2) Change the 'SELECT * FROM' to  'SELECT DISTINCT filedname1, filedname2 etc. based on your source heading.

    Public Function RecordSetFromSheet(sheetName As String)
    
    Dim rst As New ADODB.Recordset
    Dim cnx As New ADODB.Connection
    Dim cmd As New ADODB.Command
    
        'setup the connection
        '[HDR=Yes] means the Field names are in the first row
        With cnx
            .Provider = "Microsoft.Jet.OLEDB.4.0"
            .ConnectionString = "Data Source='" & ThisWorkbook.FullName & "'; " & "Extended Properties='Excel 8.0;HDR=Yes;IMEX=1'"
            .Open
        End With
    
        'setup the command
        Set cmd.ActiveConnection = cnx
        cmd.CommandType = adCmdText
        cmd.CommandText = "SELECT * FROM [" & sheetName & "$]"
        rst.CursorLocation = adUseClient
        rst.CursorType = adOpenDynamic
        rst.LockType = adLockOptimistic
    
        'open the connection
        rst.Open cmd
    
        'disconnect the recordset
        Set rst.ActiveConnection = Nothing
    
        'cleanup
        If CBool(cmd.State And adStateOpen) = True Then
            Set cmd = Nothing
        End If
    
        If CBool(cnx.State And adStateOpen) = True Then cnx.Close
        Set cnx = Nothing
    
        '"return" the recordset object
        Set RecordSetFromSheet = rst
    
    End Function
    
    

    Then use this sub to copy the data to a new worksheet 

    Public Sub Test()
    
    Dim rstData As ADODB.Recordset
    Set rstData = RecordSetFromSheet("Sheet1")
    
    Sheets("Sheet2").Range("A1").CopyFromRecordset rstData
    
    End Sub

    the above code will copy to 'Sheet2'.  you will have to change this according to your target workbook/worksheet.


    • Edited by Mohan_M Wednesday, October 4, 2017 12:38 PM
    Wednesday, October 4, 2017 12:26 PM
  • skiguy,
    re:  comparing worksheets

    Things seem to be stalled?  So I am proposing you try my commercial Excel addin "XL Professional".
    (3 week free trial and no registration)
    It works on xl97 thru xl2010, but not the later versions (blame MS).

    It will compare any two worksheets from any open workbooks.
    It loads the rows from each worksheet into separate Dictionary objects.
    Each Dictionary is compared to the other and so delivers a comparison of each row to all other rows.

    The product is a separate sheet with a listing of rows for each worksheet that are not on the other sheet.
    The execution is very fast as long as you stay below 10,000 rows or so. 

    A simple example is shown below...

    If this is of interest and you have access to an xl2010 version or earlier, I will load it at Dropbox for download.
    The offerings there change frequently.

    '---
    Jim Cone
    Portland, Oregon USA
    https://goo.gl/IUQUN2 (Dropbox)
    (free & commercial excel add-ins & workbooks)


    Tuesday, October 10, 2017 2:26 AM