none
Extract Text from Image file using .net RRS feed

  • Question

  • Hi all,

    We are working on a BPO project in which we are using Microsoft OCR components. The main functionality of the application is to read a image file(scanned image of a page of a text book) contains text. After reading this image file, the application has to create an excel sheet in which it should show "Line Number--Number of words in LIne--Number of characters in line".

    For example:

    LineNumber --- Number of Words   ---- Number of Characters

        1                          30                                  100

        2                           40                                  200

    This above data should be shown in a excel file.

    Can you please help us?

    Thanks.

    Wednesday, June 27, 2012 12:52 PM

All replies

  • I wold create a CSV  file in Visual Studio which can be read by excel.  The application would run much quicker than to use the Excel Interop Library.  CSV is just a text file with the excel column data seperated by commas.

    See tnis webpage for a C# project that uses the OCR Cpmponent

    http://www.codeproject.com/Articles/41709/How-To-Use-Office-2007-OCR-Using-C


    jdweng

    Wednesday, June 27, 2012 1:51 PM
  • You could use nPOI to create the Excel file.  It seems to work reasonably well.

    -cd Mark the best replies as answers!

    Wednesday, June 27, 2012 8:07 PM
    Moderator
  • We divided this application into two tiny modules. They are

    1. Extracting Text from Image and Storing the text in a word document

    2. Counting Number of lines and Number of Words & Number of Characters in each Line.

    We finished the second tiny module i.e. (Counting Number of lines, Number Words & Number of Characters in each Line) using Microsoft.Office.Introp.Excel (and .Word). But, now the problem is with first module i.e (Extracting Text from Image and Storing the text in a word document), we used Microsoft office OCR components to read text from image. These components are working fine if the image's text is in normal font.But, if the image's text is in italic font then the extracted text contains 80-90% bad symbols.

    Please suggest us if any third party SDK available for the same.

    Thanks.

    Monday, July 2, 2012 12:44 PM