locked
Compare two PDF files in C# windows application RRS feed

  • Question

  • Hi,

    I am looking for the components which will compare two PDF files( with text and image) and show the difference highlighted in the PDF... let me know if there is any third party tools that can be integrated with my dot net code.

    i have referred few links but its a separate application itself cannot integrate with my application.

    Thanks

    Dev

    Tuesday, December 27, 2016 6:58 AM

Answers

  • For PDF most people tend to use iTextSharp. But comparing 2 PDFs for equality is a non-trivial process irrelevant of the library used. Just because 2 PDFs look the same doesn't make them the same at all. Visually I can create 2 PDFs that are identical but under the hood 1 is pure text while the other is text with form fields that have been merged together. Things get more complicated when you start talking about what happens when images get involved. You're going to probably end up having to take the PDFs apart and compare them textually (comparing images separately) so this is going to require a reasonable amount of code on your part. I'm not aware of any standardized solution for this but Google has plenty of hits for people who have tried to do this in the past.

    For support with iTextSharp or other third-party products, please post in their forums.

    Michael Taylor
    http://www.michaeltaylorp3.net

    Tuesday, December 27, 2016 3:28 PM

All replies

  • For PDF most people tend to use iTextSharp. But comparing 2 PDFs for equality is a non-trivial process irrelevant of the library used. Just because 2 PDFs look the same doesn't make them the same at all. Visually I can create 2 PDFs that are identical but under the hood 1 is pure text while the other is text with form fields that have been merged together. Things get more complicated when you start talking about what happens when images get involved. You're going to probably end up having to take the PDFs apart and compare them textually (comparing images separately) so this is going to require a reasonable amount of code on your part. I'm not aware of any standardized solution for this but Google has plenty of hits for people who have tried to do this in the past.

    For support with iTextSharp or other third-party products, please post in their forums.

    Michael Taylor
    http://www.michaeltaylorp3.net

    Tuesday, December 27, 2016 3:28 PM
  • As suggested before, you can use iTextSharp to parse the contents of the PDF files. Another option is to use a professional SDK like LEADTOOLS (Disclaimer: I’m an employee of this product). The PDF library provided by LEADTOOLS can be used to parse PDF files using ParsePages() method:
    https://www.leadtools.com/help/leadtools/v19/dh/pdf/leadtools.pdf~leadtools.pdf.pdfdocument~parsepages.html

    Based on the options sent to the method, you can get any object from the PDF file (text, images, fonts, annotations, etc) then compare to see the similarities between the files. As mentioned in the earlier reply, the comparison in non-trivial and the comparison process might be complicated to be able to fulfill the comparison criteria you set.
    Thursday, December 29, 2016 5:35 PM
  • You can try GroupDocs.Comparison for .NET. It is a back-end API that could be integrated in any of your .NET applications (existing or new). API compares two files and shows style changes (items/content inserted or deleted) in the resultant file. You can also generate comparison summary using this API https://products.groupdocs.com/comparison/net. 

    //Source and target files to be compared    
    string source = @"source.pdf";    
    string target = @"target.pdf";    
    Comparer comparer = new Comparer();    
    //Compare two documents    
    ICompareResult result = comparer.Compare(source, target, new ComparisonSettings());  
    result.SaveDocument("output path", "output name"); 


    Access developer guide https://docs.groupdocs.com/display/comparisonnet/Developer+Guide


    • Edited by atirtahir Thursday, May 9, 2019 11:35 AM
    Thursday, May 9, 2019 11:19 AM