none
Reading content of secured PDF using excel macro RRS feed

  • Question

  • Hi,

    I've a secured pdf file which is protected against copy action. I want to automate the content validation of this secured PDF file. Is it possible to read data from secured PDF file using excel macro?

    Thanks,

    Sreenath.

    Monday, February 2, 2015 1:50 PM

Answers

  • Hi,

    I've a secured pdf file which is protected against copy action. I want to automate the content validation of this secured PDF file. Is it possible to read data from secured PDF file using excel macro?

    Thanks,

    Sreenath.

    To my knowledge it is not possible to read the content of a secured PDF file because of encryption. If you are the owner/creator of the file. I suggest you create unsecured PDFs and run your validations. You can use the following function.

    Public Function ReadAcrobatDocument(strFileName As String) As String
        'Note: A Reference to the Adobe Library must be set in Tools|References!
        Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
        Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
        Dim PageNumber, PageContent, Content, i, j
        Set AcroApp = CreateObject("AcroExch.App")
        Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
        If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
        ' The following While-Wend loop shouldn't be necessary but timing issues may occur.
        While AcroAVDoc Is Nothing
          Set AcroAVDoc = AcroApp.GetActiveDoc
        Wend
        Set AcroPDDoc = AcroAVDoc.GetPDDoc
        For i = 0 To AcroPDDoc.GetNumPages - 1
          Set PageNumber = AcroPDDoc.AcquirePage(i)
          Set PageContent = CreateObject("AcroExch.HiliteList")
          If PageContent.Add(0, 9000) <> True Then Exit Function
          Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
          ' The next line is needed to avoid errors with protected PDFs that can't be read
          On Error Resume Next
          For j = 0 To AcroTextSelect.GetNumText - 1
            Content = Content & AcroTextSelect.GetText(j)
          Next j
        Next i
        ReadAcrobatDocument = Content
        AcroAVDoc.Close True
        AcroApp.Exit
        Set AcroAVDoc = Nothing: Set AcroApp = Nothing
    End Function
    
    Sub Demo()
    Dim strPDF As String
    strPDF = ReadAcrobatDocument("PATH\FILENAME.pdf")
    'The entire content of the PDF file is loaded to strPDF. You can do all sorts of string parsing/validation. RegExp, Split, etc...
    End Sub

    Monday, February 2, 2015 2:49 PM