locked
Parsing PDF as text

    Question

  • I am trying to create a Windows Store app that reads data from pdf files. I am trying to understand the example found here
    http://code.msdn.microsoft.com/windowsapps/PDF-viewer-sample-85a4bb30

    from Microsoft. But it only renders the pdf as image. How can I extract text from a pdf?

    Also if possible (but not my goal right now, just a later issue that I will have), how I can I make sense of clickable table of contents (or chapter division) programmatically?


    mpanania.com


    Monday, April 07, 2014 1:56 PM

Answers

  • There isn't a PDF interpreter in-box. Just the viewer that you link to. If you need an interpreter you will need to either parse the files yourself or find a third party PDF control that will do so. I'm not familiar enough with the 3rd party controls to know if any will do so or they are just viewers as well.

    --Rob

    Monday, April 07, 2014 3:08 PM
    Owner

All replies

  • There isn't a PDF interpreter in-box. Just the viewer that you link to. If you need an interpreter you will need to either parse the files yourself or find a third party PDF control that will do so. I'm not familiar enough with the 3rd party controls to know if any will do so or they are just viewers as well.

    --Rob

    Monday, April 07, 2014 3:08 PM
    Owner
  • Although I haven't called thes features out specifically, check out the inventory of PDF libraries on my blog: Http://aka.ms/pdfapi
    Monday, May 05, 2014 12:44 PM