locked
File comparison and sending report issue RRS feed

  • Question

  • I have a requirement where, I need to compare pdf, CSV & XML files, and send a mismatch report in email.

    Does anyone can help the best plug-in tool that I can execute from my c# code and it can compare the files loaded in PDF, CSV, XML folder, and generate the mismatch report that I can send in email through c# code or API.

    I can ask for license approval as well if the tool serves the above purpose...

    Please suggest...

    Thursday, June 11, 2020 8:18 PM

Answers

  • Here is one that is a paid for library, don't know of any free libraries other than one that only do text.

    https://products.groupdocs.com/comparison/family


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    • Marked as answer by Sudehely Saturday, June 13, 2020 9:47 PM
    Friday, June 12, 2020 12:24 AM

All replies

  • Hello,

    Something like this where ChunkFolderName is the folder name to work against which I took out of one of my projects were ChunkFolderName is defined as a known folder in a static string property in another class. Consider it a starting point and not a total solution for comparing.

    public static void CompareChunkFiles()
    {
    	var directory = new DirectoryInfo(ChunkFolderName);
    	var files = directory.GetFiles("*.*", SearchOption.AllDirectories);
    
    	foreach (var fileInfoOuter in files)
    	{
    		var currentFile = Path.Combine(ChunkFolderName,fileInfoOuter.Name);
    		Console.WriteLine(currentFile);
    
    		foreach (var fileinfoInner in files)
    		{
    			if (currentFile != Path.Combine(ChunkFolderName,fileinfoInner.Name))
    			{
    				Console.WriteLine($"    {Path.Combine(ChunkFolderName,fileinfoInner.Name)} : Are same {CompareFileHashes(currentFile, Path.Combine(ChunkFolderName, fileinfoInner.Name))}");
    			}
    		}
    	}
    
    }
    
    private static bool CompareFileSizes(string fileName1, string fileName2)
    {
    	var fileSizeEqual = true;
    
    	// Create System.IO.FileInfo objects for both files
    	var fileInfo1 = new FileInfo(fileName1);
    	var fileInfo2 = new FileInfo(fileName2);
    
    	// Compare file sizes
    	if (fileInfo1.Length != fileInfo2.Length)
    	{
    		// File sizes are not equal therefore files are not identical
    		fileSizeEqual = false;
    	}
    
    	return fileSizeEqual;
    }
    public static bool CompareFileHashes(string fileName1, string fileName2)
    {
    	// Compare file sizes before continuing. 
    	// If sizes are equal then compare bytes.
    	if (CompareFileSizes(fileName1, fileName2))
    	{
    		// Create an instance of System.Security.Cryptography.HashAlgorithm
    		var hash = HashAlgorithm.Create();
    
    		// Declare byte arrays to store our file hashes
    		byte[] fileHash1;
    		byte[] fileHash2;
    
    		using (FileStream fileStream1 = new FileStream(fileName1, FileMode.Open), fileStream2 = new FileStream(fileName2, FileMode.Open))
    		{
    			// Compute file hashes
    			fileHash1 = hash.ComputeHash(fileStream1);
    			fileHash2 = hash.ComputeHash(fileStream2);
    		}
    
    		return BitConverter.ToString(fileHash1) == BitConverter.ToString(fileHash2);
    	}
    	else
    	{
    		return false;
    	}
    }


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    Thursday, June 11, 2020 9:04 PM
  • this will give me that files are different but not the actual difference in term of report and not sure if work for PDF.
    Looking for any plugin tool that i can execute through from c# and compare with differences listed in report or at least differences and i will prepare report specially PDF and xml.
    Thursday, June 11, 2020 10:28 PM
  • Here is one that is a paid for library, don't know of any free libraries other than one that only do text.

    https://products.groupdocs.com/comparison/family


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    • Marked as answer by Sudehely Saturday, June 13, 2020 9:47 PM
    Friday, June 12, 2020 12:24 AM
  • Hi Sudehely,

    Thank you for posting here.

    I have a question. What are the rules you compare?

    For the three folders PDF, CSV, XML, what kind of files do they contain, and what do you want the result to look like?

    Do you want to find files in the PDF folder that are not suffixed with pdf?

    Looking froward to your reply.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, June 12, 2020 1:37 AM
  • The only way to know if 2 files are different is to do a binary comparison. But that can lead to false positives in the case of text files. In the case of binary files it might as well. Or at least differences that don't matter. For example if I build my project twice in a row it will produce the same output but different files. As far as binary comparison is concerned they are different. The fact that the code itself is identical is irrelevant.

    We have no way of knowing what you are looking for as far as "differences" go but you won't be able to diff PDFs and put in a report what is different. There could be a slight header change and a differencing engine wouldn't be able to tell that. It would just see 2 binary files as different. For text files it might list the wrong differences. A classic example of this problem is differencing tools you might use in development like when you ask git for changes that are in a file or the differences between commits. It won't necessarily detect the actual differences because it is just doing textual comparison. So if you're expecting a report that says pdf B added a new line of text then that is simply not going to happen.

    Personally I would just do a binary difference and display that 2 files are different, perhaps with the byte offset/line #, file lengths and date stamps. To speed this up I'd start with the file lengths. if the files are the same length then do a binary comparison. For the comparison I would probably just do a checksum. If the checksums are different then the files are different. This is pretty standard practice for detecting changed files. If you absolutely need the differences (even if they are in invisible file headers) then you'll have to stream the files across and compare each byte.


    Michael Taylor http://www.michaeltaylorp3.net

    • Proposed as answer by Naomi N Friday, June 12, 2020 8:56 PM
    Friday, June 12, 2020 1:43 PM
  • Thanks for the link.
    Saturday, June 13, 2020 9:48 PM