I have developed an application using MODI which converts OCR scanned documents into text document( text only). I want to get the images and tables from OCR scanned document into text document with text. How to do it? I am struggling with it please help
Microsoft Office Document Imaging is a component of Office application. As far as I know, it is first introduced in Microsoft Office XP and is included in Office 2007. However, it is removed in Office 2010. For more details, please refer to this article:
If you use Office 2003,you can add the reference Microsoft Office Document Imaging 11.0 type Library to your project and please refer to the sample code in that article:
Dim inputFile AsString="C:\test\multipage.tif"Dim strRecText AsString=""Dim Doc1 As MODI.Document
Doc1 =New MODI.Document
Doc1.OCR()' this will ocr all pages of a multi-page tiff file
Doc1.Save()' this will save the deskewed reoriented images, and the OCR text, back to the inputFileFor imageCounter AsInteger=0To(Doc1.Images.Count-1)' work your way through each page of results
strRecText &= Doc1.Images(imageCounter).Layout.Text' this puts the ocr results into a stringNextFile.AppendAllText("C:\test\testmodi.txt", strRecText)' write the OCR file out to disk
Doc1.Close()' clean up
As for Office 2010, please refer to this KB article:
I do not have any issue with MODI Library. It is working fine. I am using Office 2007 Licensed version. My issue is I need to extract or read table, Images with text from scanned document and put into to word document file with text.
I have already done to extract text from scanned document and store the extracted text into text document.