Answered by:
pdf to text

Question
-
User2045693258 posted
can anyone provide a simple solution for reading a pdf in vb.net
I've tried itextsharp but it's way too complicated for me, also pdfbox is a bit much as well (having a lot of java initialization type errors). All i need to do is get the text out of a pdf with no regard to formatting it. anything that's quick and dirty will do as long as it can get text from pdfs that are stored online. any help is greatly appreciated.
Friday, February 5, 2010 9:55 AM
Answers
-
User1364706731 posted
It requires that you have the full version of Adobe installed on your PC so that you can gain access to the Adobe APIs (which doesn't technically qualify as a free way to do it).
Here is the code I used to read the contents of a PDF. You will have to add a reference to the Adobe APIs in your project:
Dim objPDFPage As AcroPDPage
Dim objPDFDoc As New AcroPDDoc
Dim objPDFAVDoc As AcroAVDoc
Dim objAcroApp As AcroApp
Dim objPDFRectTemp As Object
Dim objPDFRect As New AcroRect
Dim lngTextRangeCount As Long
Dim objPDFTextSelection As AcroPDTextSelect
Dim temptextcount As Long
Dim strText As String
Dim lngPageCount As Long
Dim Fora As Long
objPDFDoc.Open(tbdocdisplaypath.Text)
lngPageCount = objPDFDoc.GetNumPages
For Fora = 0 To lngPageCount - 1
objPDFPage = objPDFDoc.AcquirePage(Fora)
objPDFRectTemp = objPDFPage.GetSize
objPDFRect.Left = 0
objPDFRect.right = objPDFRectTemp.x
objPDFRect.Top = objPDFRectTemp.y
objPDFRect.bottom = 0
' objPDFTextSelection = objPDFDoc.CreateTextSelect(lngPageCount, objPDFRect)
objPDFTextSelection = objPDFDoc.CreateTextSelect(Fora, objPDFRect)
' Get The Text Of The Range
temptextcount = objPDFTextSelection.GetNumText
For lngTextRangeCount = 1 To objPDFTextSelection.GetNumText
doctext = doctext & objPDFTextSelection.GetText(lngTextRangeCount - 1)
Next
doctext = doctext & vbCrLf
Next
doctype = "PDF"
objPDFDoc.Close()
- Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
Monday, February 8, 2010 5:01 AM
All replies
-
User1364706731 posted
It requires that you have the full version of Adobe installed on your PC so that you can gain access to the Adobe APIs (which doesn't technically qualify as a free way to do it).
Here is the code I used to read the contents of a PDF. You will have to add a reference to the Adobe APIs in your project:
Dim objPDFPage As AcroPDPage
Dim objPDFDoc As New AcroPDDoc
Dim objPDFAVDoc As AcroAVDoc
Dim objAcroApp As AcroApp
Dim objPDFRectTemp As Object
Dim objPDFRect As New AcroRect
Dim lngTextRangeCount As Long
Dim objPDFTextSelection As AcroPDTextSelect
Dim temptextcount As Long
Dim strText As String
Dim lngPageCount As Long
Dim Fora As Long
objPDFDoc.Open(tbdocdisplaypath.Text)
lngPageCount = objPDFDoc.GetNumPages
For Fora = 0 To lngPageCount - 1
objPDFPage = objPDFDoc.AcquirePage(Fora)
objPDFRectTemp = objPDFPage.GetSize
objPDFRect.Left = 0
objPDFRect.right = objPDFRectTemp.x
objPDFRect.Top = objPDFRectTemp.y
objPDFRect.bottom = 0
' objPDFTextSelection = objPDFDoc.CreateTextSelect(lngPageCount, objPDFRect)
objPDFTextSelection = objPDFDoc.CreateTextSelect(Fora, objPDFRect)
' Get The Text Of The Range
temptextcount = objPDFTextSelection.GetNumText
For lngTextRangeCount = 1 To objPDFTextSelection.GetNumText
doctext = doctext & objPDFTextSelection.GetText(lngTextRangeCount - 1)
Next
doctext = doctext & vbCrLf
Next
doctype = "PDF"
objPDFDoc.Close()
- Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
Monday, February 8, 2010 5:01 AM -
User1485622831 posted
I've been looking all over for this sort of code but I can't find any documentation anywhere, does anybody know where you can get the Adobe documentation?
Thanks
Thursday, February 18, 2010 5:42 AM