none
How to convert pdf document into excel?

    Question

  • Hi Guys,

    M facing a big issue..i want to read the tables which are in PDF document and want to store these values in database.

    So what i searched over internet, the possible solutions are..

    1. Read text from PDF using PDFBOX which is freeware available.But its not right solution b'coz i want to read the tables.

    2. Convert PDF document into Excel/Word. Tables will come in the target document as it is. Word conversion is possible by using EasyPDF Converter which is third party tool which is much cheaper than the other solution available in other tool which converts PDF into Excel.

    But i am looking for any other solution/API classes which can convert PDF into Excel.

    Guys..if you have any solution plez help me...i need to implement this ASAP..

    Most important thing is, This is our customer's requirements(Storing PDF data into database.)


    Janardan Baghla
    Friday, December 04, 2009 11:32 AM

All replies

  • Hi,

    There is a library called iText (http://en.wikipedia.org/wiki/Itext) that might be able to covert the PDF to another format that you can then convert to Excel. There is a .NET version and although I know about it, I've never used it.

    Maybe thats a potential way forward.

    Friday, December 04, 2009 1:25 PM
  • Hi Smyth,

    Thnx for the Info. What i searched about this..i didn't find iTextSharp can convert the PDF into Excel.
    It converts the doc into PDF, HTML into PDF etc... So it doesn't suite to my solution....

    nyz..thnx for the help....

    Anyone who can help me out to solve this giant..:(
    Janardan Baghla
    Monday, December 07, 2009 5:14 AM
  • Hi,

    As far as I can know there is no this kind of tool .

    I'd suggest you first get the info you are interested in , then insert them to database.

    Could you please be more specific about reading table, can't PDFBOX do it?
    Have you tried using other Pdf reader?

    I find some info about the pdf format :
    The pdf format is just a canvas where text and graphics are placed without any structure information.
    from the thread :
    http://social.msdn.microsoft.com/forums/en-US/Vsexpressvcs/thread/84cdf13a-35eb-4a99-8f4c-422b153783ff

    So I think you just need to retrieve the prefered data using one of these pdf reader.

    Harry
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Thursday, December 10, 2009 3:34 AM
  • Hi Harry,

    PDFBOX can read the text from PDF but it writes the data in linear format. It doesn't detect the tables available in PDF document.
    I want to read the tables b'coz i need to read some column values and need to insert these values in DB. So in that case PDFBOX is not useful.
    What i thought if there is any SDK/API available which can convert the PDF document into EXCEL as it is with tables then i can read the excel sheet and put the data into DB. There is EasyPDF tool/SDK available which can convert the PDF into word as it is with tables but it is licensed(third-party tool). But i want to know if there is any freeware SDK available for this.

    Thnx..
    Janardan Baghla
    Thursday, December 10, 2009 7:12 AM
  • Hi Smyth,

    Thnx for the Info. What i searched about this..i didn't find iTextSharp can convert the PDF into Excel.
    It converts the doc into PDF, HTML into PDF etc... So it doesn't suite to my solution....

    nyz..thnx for the help....

    Anyone who can help me out to solve this giant..:(
    Janardan Baghla

    Hi,

    Sorry about that, thought maybe the PdfReader class might have helped.
    Thursday, December 10, 2009 11:32 AM
  • I think the quality and price are all consideration aspects, I have ever found a good PDF to Excel Converter on Internet! It's a tough time.

    Until I come across this PDF to Excel Converter, I think it will alsonot let you down. Well preserve the spreasheet and data.

    Saturday, January 22, 2011 9:30 AM
  • Janardan,

    Be aware that pdf is a licensed product from Adobe. 

    You know the battles between Microsoft, Adobe, Apple, Samsung, Google and more of those big players around those licenses.

    I assume you are a player like me, not rich enough to participate in those battles.


    Success
    Cor

    Tuesday, February 28, 2012 8:18 AM
  • I know this is a pretty old post, but I'll throw this out there anyway:

    http://www.investintech.com/able2extract.html

    I've used it many times before; it works great!!


    Ryan Shuell


    • Edited by ryguy72 Tuesday, June 05, 2012 5:16 PM
    Tuesday, June 05, 2012 5:16 PM