none
VBA - Search Text in PDF inside a specific Area RRS feed

  • Question

  • Hey Guys,

    i have a bunch of PDF´s and need to search inside of them ( ~300-1000 pages per PDF) for specific chapters and give the page of that chapter back to Excel.

    I can open the PDF via VBA but because the chapter is always in the upper-right corner, i somehow want to search only in a given rectangle.

    Does anyone know how to do that?

    Tuesday, October 20, 2015 3:27 PM

All replies

  • I did this once many years ago.  I had to have Acrobat Professional installed ($$$).  There are free pdf parsers that you access from VBA using Win32 commands like:

    https://code.google.com/p/peepdf/

    https://pypi.python.org/pypi/pdfminer/

    Maybe something has changed since I did it.

    There is one other possibility.  The a tool called Calibre Book Management system (free).  You can translate it to .docx format and use Word VBA to extract info.  Calibre has both a GUI and command line interface that you call from VBA using Win32 commands.  The translation to Word is not great in terms of formatting but maybe it is good enough for your purposes.

    • Edited by mogulman52 Tuesday, October 20, 2015 5:54 PM
    Tuesday, October 20, 2015 3:59 PM
  • You can do it directly using VBA, only if you have full Acrobat (not Reader):

    Sub test()
    Dim objApp As Object
    Dim objPDDoc As Object
    Dim objjso As Object
    Dim wordsCount As Long
    Dim page As Long
    Dim i As Long
    Dim strData As String
    Dim strFileName As String
    
    strFileName = "d:\b.pdf"
    
    Set objApp = CreateObject("AcroExch.App")
    Set objPDDoc = CreateObject("AcroExch.PDDoc")
    If objPDDoc.Open(strFileName) Then
        Set objjso = objPDDoc.GetJSObject
        For page = 0 To objPDDoc.GetNumPages - 1
            wordsCount = objjso.GetPageNumWords(page)
            For i = 0 To wordsCount
                strData = strData & " " & objjso.getPageNthWord(page, i)
            Next i
        Next
        MsgBox strData
    Else
        'problem with path, or file is damage...
        MsgBox "Problem with open file!",,"VBATools.pl"
    End If
    End Sub 


    Oskar Shon, Office System MVP - www.VBATools.pl
    if Helpful; Answer when a problem solved

    Saturday, October 24, 2015 1:32 PM
    Answerer