none
Compare contents of files in two folders

    Question

  • Hi,

     I need to compare contents of files in two directories. Files are auto generated so it'd be the same content in two different file names. I saw tons of utilities on web for interactive comparison. But, both folders contain thousands of files so it'd have to be done programmatically.

    If a non-matching content is found its file name needs to be logged somehow (I'm flexible on means).

    I can write a .Net program using IO classes but I hate to re-invent the wheel. Hopefully, someone knows a utility for my task.

     

    Thank you,

     Yakov

    Wednesday, August 29, 2012 4:38 PM

Answers

  • This doesn't solve my problem because file names are different and, in second case, I'd have to either compare file names/checksums in two folders manually or write a program for it.

    But thank you for you helping me, Dan! And, possibly, those would be helpful for someone with a similar problem.

    Since no new answers came in I’ve resorted to writing my own routine. It’d require VS and hardcoded, although. (and, remove comments for watching a scan progress).

            static void Main(string[] args)
            {
                string sSourceFolder, sFolderToCheck, sFiletext1, sFiletext2, sLogFile;  
                bool bFound;
                sSourceFolder = @"C:\Temp\Text1";
                sFolderToCheck = @"C:\Temp\Text2";
                sLogFile = @"C:\Temp\Text2\Log\Log.txt";
                string[] files1 = Directory.GetFiles(sSourceFolder);
                string[] files2 = Directory.GetFiles(sFolderToCheck);
                StreamWriter LogFile = new StreamWriter(sLogFile);
                //int i = 0;
                foreach (string file in files1)
                {
                    //i++;
                    //Console.WriteLine(i+": "+Path.GetFileName(file));
                    sFiletext1 = File.ReadAllText(file);
                    bFound = false;
                    foreach (string file2 in files2)
                    {
                        sFiletext2 = File.ReadAllText(file2);
                        if (sFiletext1 == sFiletext2)
                        { bFound = true; break; }
                    }
                    if (!bFound)
                    {
                        LogFile.WriteLine(file);
                    }                
                }
                LogFile.Flush();
            }

    • Marked as answer by Yakov72 Wednesday, September 05, 2012 10:35 PM
    Wednesday, September 05, 2012 10:34 PM

All replies

  • 1st of all, you have to create a new thread, to put all this code of comparion on it, else (especially if you have thousands of files) your UI will freeze.

    Now, how do you want to compare? One file from 1st folder, compare to all of files from 2nd folder, then 2nd file from 1st folder, with all the files from 2nd folder, and so on...?

    Only this way you will check all the possibilities.


    Mitja

    Wednesday, August 29, 2012 4:49 PM
  • A good utility for this is rsync. Use the -n -c options to just list the files with different checksums.

    Here is more info on rsync. http://rsync.samba.org/documentation.html


    Dan Randolph - My Code Samples List

    Wednesday, August 29, 2012 6:28 PM
  • Check sums might work. But I looked it up and didn't find how to use it... can you give me a sample? Or, a link to a help like you get for a DOS commands?
    Wednesday, August 29, 2012 7:21 PM
  • 1st of all, you have to create a new thread, to put all this code of comparion on it, else (especially if you have thousands of files) your UI will freeze.

    I actually don't want a UI.
    Wednesday, August 29, 2012 7:24 PM
  • I could not find a way to actually list the checksum values using rsync. But you can do this to see what files with same name, but different checksums will be copied:

    C:\Program Files\cwRsync\bin>rsync -ncv /cygdrive/c/test/rsync/* /cygdrive/c/test/rsync2

    Here is a link with more instructions:
    http://www.rsync.net/resources/howto/windows_rsync.html

    You can also use FastSum command-line version to generate, store and list checksum. To just list them:

    C:\Program Files\FastSum>fsum \test\rsync\*

    This will list all files in folder <current drive> "\test\rsync". The md5 checksum is the last column.

    Here is the download link for fsum: fastsum.com download

    There is also a way to do this using Powershell, but is requires some coding:

    Comparing files via checksum with Powershell



    Dan Randolph - My Code Samples List



    Saturday, September 01, 2012 3:41 PM
  • There are probably better places to post this question other than on the "Windows Forms General"

    forum. Windows Forms is for Windows UI development.


    Dan Randolph - My Code Samples List

    Sunday, September 02, 2012 8:32 PM
  • This doesn't solve my problem because file names are different and, in second case, I'd have to either compare file names/checksums in two folders manually or write a program for it.

    But thank you for you helping me, Dan! And, possibly, those would be helpful for someone with a similar problem.

    Since no new answers came in I’ve resorted to writing my own routine. It’d require VS and hardcoded, although. (and, remove comments for watching a scan progress).

            static void Main(string[] args)
            {
                string sSourceFolder, sFolderToCheck, sFiletext1, sFiletext2, sLogFile;  
                bool bFound;
                sSourceFolder = @"C:\Temp\Text1";
                sFolderToCheck = @"C:\Temp\Text2";
                sLogFile = @"C:\Temp\Text2\Log\Log.txt";
                string[] files1 = Directory.GetFiles(sSourceFolder);
                string[] files2 = Directory.GetFiles(sFolderToCheck);
                StreamWriter LogFile = new StreamWriter(sLogFile);
                //int i = 0;
                foreach (string file in files1)
                {
                    //i++;
                    //Console.WriteLine(i+": "+Path.GetFileName(file));
                    sFiletext1 = File.ReadAllText(file);
                    bFound = false;
                    foreach (string file2 in files2)
                    {
                        sFiletext2 = File.ReadAllText(file2);
                        if (sFiletext1 == sFiletext2)
                        { bFound = true; break; }
                    }
                    if (!bFound)
                    {
                        LogFile.WriteLine(file);
                    }                
                }
                LogFile.Flush();
            }

    • Marked as answer by Yakov72 Wednesday, September 05, 2012 10:35 PM
    Wednesday, September 05, 2012 10:34 PM