Answered by:
how to read pdf file through C# ?

Question
-
hi
i have pdf file and i need to read the text and to insert to any val.
how do to it with C# (winform) ?
thank's in advanceMonday, May 31, 2010 7:27 AM
Answers
-
-
thank's for the help,
but in those sample i can only make new pdf file.
and i need only to read the text from pdf file to any val in my C# program
Hi there.Well, I don't agree with you. They have classes for reading the contents of PDF documents. Please at least download the samples. And here is a sample to do that .
Regards,
Magnus
My blog: InsomniacGeek.com- Marked as answer by Liliane Teng Friday, June 4, 2010 9:15 AM
Monday, May 31, 2010 10:41 AM -
Hello E_gold,
Thanks for your post.
The following websites could give you an idea of how to achieve.
http://jadn.co.uk/w/ReadPdfUsingCsharp.htm
(How to read pdf files using C# .NET)
http://social.msdn.microsoft.com/forums/en-US/xmlandnetfx/thread/4a9fb479-b48e-4366-ad39-02b2dac674f5/
(read pdf content into text file using c#.net)If you have any problems, please feel free to follow up.
Best regards,
Liliane
Please mark the replies as answers if they help and unmark them if they provide no help. Thanks- Marked as answer by Liliane Teng Friday, June 4, 2010 9:15 AM
Wednesday, June 2, 2010 7:53 AM
All replies
-
Hi there.
Take a look at these 2 projects on codeplex.com;
Hope this helps.
Regards,
Magnus
My blog: InsomniacGeek.comMonday, May 31, 2010 7:39 AM -
thank's for the help,
but in those sample i can only make new pdf file.
and i need only to read the text from pdf file to any val in my C# program
Monday, May 31, 2010 8:15 AM -
-
This question has been asked many times by user. I suggest to first google your question and then post it here.
Moderators: Make this question available as a FAQ.
Monday, May 31, 2010 8:47 AM -
thank's for the help,
but in those sample i can only make new pdf file.
and i need only to read the text from pdf file to any val in my C# program
Hi there.Well, I don't agree with you. They have classes for reading the contents of PDF documents. Please at least download the samples. And here is a sample to do that .
Regards,
Magnus
My blog: InsomniacGeek.com- Marked as answer by Liliane Teng Friday, June 4, 2010 9:15 AM
Monday, May 31, 2010 10:41 AM -
Hello E_gold,
Thanks for your post.
The following websites could give you an idea of how to achieve.
http://jadn.co.uk/w/ReadPdfUsingCsharp.htm
(How to read pdf files using C# .NET)
http://social.msdn.microsoft.com/forums/en-US/xmlandnetfx/thread/4a9fb479-b48e-4366-ad39-02b2dac674f5/
(read pdf content into text file using c#.net)If you have any problems, please feel free to follow up.
Best regards,
Liliane
Please mark the replies as answers if they help and unmark them if they provide no help. Thanks- Marked as answer by Liliane Teng Friday, June 4, 2010 9:15 AM
Wednesday, June 2, 2010 7:53 AM -
Hello Agalo,
Yeah.. This question is very common. Your suggestion is very good. We will consider it. Thanks.
If you have any problems or suggestions, please feel free to contact me.
Best regards,
Liliane
Please mark the replies as answers if they help and unmark them if they provide no help. ThanksWednesday, June 2, 2010 8:03 AM -
for those with acrobat installed, there are a couple c# members that can get all words of a PDF document:
as posted by gg1
for course this options may not be viable if you think of distributing your application for profit or if you need format
In summary the link has the fllowing sample code and some adobe website refereces:
// the following will allow word extraction by pdf file spec
// opening the pdf document is rather crude and need to be more robust
public static string getTextFromPDF(string filespec)
{
Acrobat.AcroAppClass gAppClass = new Acrobat.AcroAppClass();
Acrobat.AcroAVDoc avDoc = (Acrobat.AcroAVDoc)gAppClass.GetInterface("Acrobat.AcroAVDoc"); //Visible pdf document with a UI Window
avDoc.Open(System.IO.Path.GetFullPath(filespec), System.IO.Path.GetFileName(filespec));
AcroPDDoc doc = (AcroPDDoc)avDoc.GetPDDoc();
string txt = PdDocGetText(doc);
doc.Close();
avDoc.Close(1);
gAppClass.Exit();
return txt;
}
// slightly modified version of a post in adobe forum by originally by Eldrarak82
private static string PdDocGetText(AcroPDDoc pdDoc)
{
AcroPDPage page;
int pages = pdDoc.GetNumPages();
string pageText = "";
for (int i = 0; i < pages; i++)
{
page = (AcroPDPage)pdDoc.AcquirePage(i);
object jso, jsNumWords, jsWord;
List<string> words = new List<string>();
try
{
jso = pdDoc.GetJSObject();
if (jso != null)
{
object[] args = new object[] { i };
jsNumWords = jso.GetType().InvokeMember("getPageNumWords", System.Reflection.BindingFlags.InvokeMethod, null, jso, args, null);
int numWords = Int32.Parse(jsNumWords.ToString());
for (int j = 0; j <= numWords; j++)
{
object[] argsj = new object[] { i, j, false };
jsWord = jso.GetType().InvokeMember("getPageNthWord", System.Reflection.BindingFlags.InvokeMethod, null, jso, argsj, null);
words.Add((string)jsWord);
}
}
foreach (string word in words)
{
pageText += word;
}
}
catch
{
}
}
return pageText;
}the above code sample has yet to be fully tested and may need improvement. nonetheless it is a good starting point.
for those interested in tables, rows and columns, look up the doucments by adobe like
around page 130ish to 136
the link
may also be helpfull for a lot other tasks.
- Edited by fs - ab Friday, June 1, 2012 4:44 PM
Thursday, May 31, 2012 12:32 AM