none
using MODI (Microsoft Office Document Imaging) how to get images and table in text file

    Question

  • Dear All.

    I have developed an application using MODI which converts OCR scanned documents into text document( text only). I want to get the images and tables from OCR scanned document into text document with text. How to do it? I am struggling with it please help me ASAP.

     

     


    Laxman
    Thursday, July 28, 2011 8:43 AM

All replies

  • Hi Laxman,

    Thank you for posting.

    Microsoft Office Document Imaging is a component of Office application. As far as I know, it is first introduced in Microsoft Office XP and is included in Office 2007. However, it is removed in Office 2010. For more details, please refer to this article:

    http://en.wikipedia.org/wiki/Microsoft_Office_Document_Imaging

    If you use Office 2003,you can add the reference Microsoft Office Document Imaging 11.0 type Library to your project and please refer to the sample code in that article:

    Dim inputFile As String = "C:\test\multipage.tif"
    Dim strRecText As String = ""
    Dim Doc1 As MODI.Document
     
    Doc1 = New MODI.Document
    Doc1.Create(inputFile)
    Doc1.OCR() ' this will ocr all pages of a multi-page tiff file
    Doc1.Save() ' this will save the deskewed reoriented images, and the OCR text, back to the inputFile
     
    For imageCounter As Integer = 0 To (Doc1.Images.Count - 1) ' work your way through each page of results
      strRecText &= Doc1.Images(imageCounter).Layout.Text  ' this puts the ocr results into a string
    Next
     
    File.AppendAllText("C:\test\testmodi.txt", strRecText)   ' write the OCR file out to disk
     
    Doc1.Close() ' clean up
    Doc1 = Nothing

    As for Office 2010, please refer to this KB article:

    http://support.microsoft.com/kb/982760

    which introduces how to install MODI and other alternative methods.

    Hope the information can help you and feel free to follow up after you have tried.

    Best Regards,


    Bruce Song [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Tuesday, August 02, 2011 8:03 AM
  • Hi Bruce,

    Thanks for your reply.

    I do not have any issue with MODI Library. It is working fine. I am using Office 2007 Licensed version. My issue  is I need to extract or read table, Images with text from scanned document and put into to word document file with text.

    I have already done to extract text from scanned document and store the extracted text into text document.

    Hope you will provide me a sample in .net.

    Thanks

    Laxman

     

     

     


    Laxman
    Wednesday, August 03, 2011 9:00 AM