none
How to read contents of PDF or convert PDF to Text file? RRS feed

  • Question

  • I'm wondering if there is a way to read the contents of a PDF file into memory, search for a string, and write the results to a cell.  I do NOT have Adobe Acrobat installed; I only have Adobe Reader. 

    I can do this kind of task easily from a Word doc or a Text file.  I just can't figure out how to read the contents of a PDF (for free) or convert a PDF to a Text file.  BTW, I need to loop through lots of PDF files in a single folder.  If someone here even has a VBA script to convert a PDF to a Word doc, please share it.  I can run it, loop through the files in the folder, convert all to Word docs, and then read the contents of each into an array of Excel cells.  I just can't figure out how to work with these PDF files without having Adobe Acrobat installed.

    Thanks everyone.


    Knowledge is the only thing that I can give you, and still retain, and we are both better off for it.

    Friday, May 8, 2015 11:08 PM

Answers

  • Hi,

    This is the forum to discuss questions about Microsoft Excel develop, I’m afraid there isn’t an Office Object and method could support convert PDF files to text files. Since you are looking for a solution of VBA script, I’m moving this thread to VBA forum, so you can get more qualified pool of respondents there.

    Thanks for your understanding

    Best Regards

    Lan


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    • Marked as answer by ryguy72 Monday, May 11, 2015 9:08 PM
    Monday, May 11, 2015 8:08 AM
    Moderator
  • If you have Word 2013 and the PDFs were created from documents and not graphics of documents then you should be able to process the files directly in Word (as Word 2013 will open editable copies of PDF files). Otherwise you will need a programmable PDF editor such as Acrobat.

    Graham Mayor - Word MVP
    www.gmayor.com

    • Marked as answer by ryguy72 Monday, May 11, 2015 9:08 PM
    Monday, May 11, 2015 9:52 AM
  • Probably getting Word 2013 is easiest.  I did this once many years ago.  I had to have Acrobat Professional installed ($$$).  There are free pdf parsers that you access from VBA using Win32 commands like:

    https://code.google.com/p/peepdf/

    • Marked as answer by ryguy72 Monday, May 11, 2015 9:08 PM
    Monday, May 11, 2015 1:06 PM
  • The only free ones I know of are peepdf which is Python based

    https://code.google.com/p/peepdf/

    The other one is C#

    http://toxy.codeplex.com/

    • Marked as answer by ryguy72 Monday, May 11, 2015 9:08 PM
    Monday, May 11, 2015 6:46 PM

All replies

  • Happened see a related article about convert PDF to Word, you can refer this link.

    How to convert PDF to Doc in C#.VB.net.

    Just one line code 

    doc.SaveToFile("PDFtoDoc.doc", FileFormat.DOC);
    you can convert PDF to doc files without having installed Adobe Acrobat. But it may not support other word format such as docx .ect.
    Monday, May 11, 2015 6:11 AM
  • Hi,

    This is the forum to discuss questions about Microsoft Excel develop, I’m afraid there isn’t an Office Object and method could support convert PDF files to text files. Since you are looking for a solution of VBA script, I’m moving this thread to VBA forum, so you can get more qualified pool of respondents there.

    Thanks for your understanding

    Best Regards

    Lan


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    • Marked as answer by ryguy72 Monday, May 11, 2015 9:08 PM
    Monday, May 11, 2015 8:08 AM
    Moderator
  • If you have Word 2013 and the PDFs were created from documents and not graphics of documents then you should be able to process the files directly in Word (as Word 2013 will open editable copies of PDF files). Otherwise you will need a programmable PDF editor such as Acrobat.

    Graham Mayor - Word MVP
    www.gmayor.com

    • Marked as answer by ryguy72 Monday, May 11, 2015 9:08 PM
    Monday, May 11, 2015 9:52 AM
  • Probably getting Word 2013 is easiest.  I did this once many years ago.  I had to have Acrobat Professional installed ($$$).  There are free pdf parsers that you access from VBA using Win32 commands like:

    https://code.google.com/p/peepdf/

    • Marked as answer by ryguy72 Monday, May 11, 2015 9:08 PM
    Monday, May 11, 2015 1:06 PM
  • @jujubee, I think you need to run that through .NET.  I don't think any of those required libraries are exposed to either Excel or VBA.  Also, it looks like you need to buy some expensive 3rd party software.

    @Graham & mogulman, I do have 2013 on my home PC but my work PC only has 2010.  This is all for a project at work.  I don't think I can copy all the files over to my home PC.

    Are there any other free options or have we already exploited all possibilities? 

    Thanks everyone.


    Knowledge is the only thing that I can give you, and still retain, and we are both better off for it.

    Monday, May 11, 2015 4:21 PM
  • The only free ones I know of are peepdf which is Python based

    https://code.google.com/p/peepdf/

    The other one is C#

    http://toxy.codeplex.com/

    • Marked as answer by ryguy72 Monday, May 11, 2015 9:08 PM
    Monday, May 11, 2015 6:46 PM
  • Ok.  Thanks again.

    Knowledge is the only thing that I can give you, and still retain, and we are both better off for it.

    Monday, May 11, 2015 9:08 PM
  • Ryguy you can check out this article (it uses iTextSharp):
    http://www.codeproject.com/Articles/14170/Extract-Text-from-PDF-in-C-NET

    Or you can check this article about reading PDF files and retrieving its text in C# (it uses GemBox.Document).

    I hope this helps.

    Wednesday, December 16, 2015 8:42 AM
  • There is one other technique that I thought about.  There is a book converter called Calibre (free).  It is mostly used to convert ebook formats (mobi, epub...) but it can convert PDF to several formats including .doc.  It doesn't do a great job with formatting but you get the text.  It has a command line capability.  I have used it to convert Word docs to epub so they are easier to read on my Nook.
    Wednesday, December 16, 2015 2:15 PM