locked
Read PDF with iTextSharp RRS feed

  • Question

  • User-1231836060 posted

    Is possible to read a PDF content to iTextSharp?...


    Wednesday, September 29, 2010 7:48 AM

All replies

  • User1520641890 posted

    yes it's possible. for example if you need to retrieve the text:

    string text = PdfTextExtractor.GetTextFromPage(reader, pageNumber);

    if you have more specific questions i suggest the mailing list. depending what you need, some of the the content parser classes are undergoing serious development.

    Wednesday, September 29, 2010 8:40 PM
  • User-1231836060 posted

    Hi kuujinbo,

        Can you give me the full source code for reading pdf file?..

    Because PdfTextExtractor Its not working.

    Wednesday, September 29, 2010 9:55 PM
  • User1520641890 posted

    ...

    Because PdfTextExtractor Its not working.

    pass a PDfReader and page number to the method:


    <%@ WebHandler Language="C#" Class="iTextTextExtract" %>
    using System;
    using System.Web;
    using iTextSharp.text.pdf;
    using iTextSharp.text.pdf.parser;
    
    
    public class iTextTextExtract : IHttpHandler {
        
      public void ProcessRequest (HttpContext context) {
        PdfReader reader = new PdfReader(context.Server.MapPath("YOUR_PDF_FILE.pdf"));
        int pageNumber = 1;
        while (pageNumber <= reader.NumberOfPages) {
          context.Response.Write(string.Format("<p>{0}</p>",
            PdfTextExtractor.GetTextFromPage(reader, pageNumber)
          ));
          ++pageNumber;
        }
      }
    
      public bool IsReusable {
          get { return false; }
      }
    }


    obviously, replace 'YOUR_PDF_FILE.pdf' above with the file you're trying to parse. and again if you need more complex output ask on the mailing list. you'll notice the text returned by GetTextFromPage() is very compact. also suggest you use the latest version.

    Thursday, September 30, 2010 12:21 AM
  • User-1231836060 posted

    Hi kuujinbo,

    I am getting error.

    The directive 'webhandler' is unknown. 

    Thursday, September 30, 2010 3:25 AM
  • User1520641890 posted

    The directive 'webhandler' is unknown. 

    seriously....

    @WebHandler

    HTTP Handlers and HTTP Modules Overview

    Thursday, September 30, 2010 3:36 AM