none
using MODI (Microsoft Office Document Imaging) how to get images and table in text file RRS feed

  • Question

  • Dear All,

     

    Using MODI office library how to read images, tables and formats together with text and put into text document file. I have done for text only. I am using C#.net to develop application for it.

    my code is following:

      bool isConverted = false;
                System.Windows.Forms.RichTextBox rtboxConvertedText;
                //Creating an object of MODI Document.
                MODI.Document modiDoc = new MODI.Document();
                try
                {
                    rtboxConvertedText = new System.Windows.Forms.RichTextBox();
                    //this event reports the progress of the OCR 
                    modiDoc.OnOCRProgress += new MODI._IDocumentEvents_OnOCRProgressEventHandler(this.TrackProgress);
                    // The Create method grabs the picture from disk snd prepares for OCR.  
                    modiDoc.Create(fileName);
                    // Do the OCR.
                    MODI.Image image = (MODI.Image)modiDoc.Images[0];
                    int wordCount = 0;
                    try
                    {
                        wordCount = image.Layout.NumWords;
                    }
                    catch
                    {
                        wordCount = -1;
                    }
                    //==========
                    //if (wordCount == -1)
                    //{
                    modiDoc.OCR(MiLANGUAGES.miLANG_ENGLISH, true, true);
                    //modiDoc.Save();
                    //}
                    //loop thru the pages, and desplaying the text in each 
                    for (int i = 0; i < modiDoc.Images.Count; i++)
                    {
                        image = (MODI.Image)modiDoc.Images[i];
                        MODI.Layout layout = (MODI.Layout)image.Layout;
                        //====Reading text word by word=====
                        //foreach (Word word in layout.Words)
                        //{
                        //    MessageBox.Show(word.Text);
                        //}
                        //========end==============
                        rtboxConvertedText.Text = image.Layout.Text+ " ";
                        string TargetFileName = string.Empty;
                        string[] GetFileName = fileName.Split('\\');
                        TargetFileName = GetFileName[GetFileName.Length - 1];
                        TargetFileName = TargetFileName.Substring(0, TargetFileName.LastIndexOf("."));
                      
                        if (DocumentType.ToUpper().Trim() == "DOCUMENT FILE")
                        {
                            TargetFileName = DestinationFolder + "\\" + TargetFileName + ".Doc";
                        }
                        else
                        {
                            TargetFileName = DestinationFolder + "\\" + TargetFileName + ".rtf";
                        }
                        File.WriteAllText(TargetFileName, rtboxConvertedText.Text, Encoding.UTF8);
                        isConverted = true;
                      }
                    return isConverted;    
                }
                catch (Exception ex)
                {
                    long errorCode = ex.Message.GetHashCode();
                    string LogFileName = string.Empty;

                    LogFileName = @Application.StartupPath + "\\ErrorLog " + DateTime.Today.Year + "-" + DateTime.Today.Month + "-" + DateTime.Today.Day + ".rtf";
                    StreamWriter sw = File.AppendText(LogFileName);
                    if (errorCode == -1095175215)
                        sw.WriteLine(DateTime.Now.ToShortTimeString() + " " + fileName + " Invalid Format.");
                    else
                        sw.WriteLine(DateTime.Now.ToShortTimeString() + " " + fileName + " " + ex.Message);
                    sw.Close();
                             

                }
                finally
                {
                    modiDoc = null;
                }
                return isConverted;  
            
            }

     

    Its very urgent
    Laxman
    Monday, August 1, 2011 10:21 AM

All replies

  • Hi Laxman,

    Thanks for your post.

    Here is sample project about how to retieve images from a document and then concert to text by using MODI and OpenXML SDK

    http://www.codeproject.com/KB/office/OCRSampleApplication.aspx

    &

    This sample project demonstrates how to export, delete and replace the images in a document by using Open XML SDK

    http://code.msdn.microsoft.com/CSManipulateImagesInWordDoc-312da7ef

    I hope this helps. 


    Best Regards, Calvin Gao [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Thursday, August 4, 2011 1:07 PM
    Moderator
  • Hi Calvin,

    Thanks a lot for your reply.

    My issue is while converting image files (scanned document files) into word document file, I am able to get or extract text only using MODI office library. I desperately want to extract tables and images with text in formatted way in word document. I have tried many was but could not get the solution.

    Hope you will help me.

     

    Thanks

     

     

     


    Laxman
    Saturday, August 6, 2011 4:40 AM
  • Hy laxman,

    I have read this content . 

    I am also writing an application that retrieves text as well as images ( pics of baby , playing childrens ) from PDF images / images  . i have used MODI liabrary . but i retrives only text , not a single pictures as like your previous error . 

    If u have solve that problem then please help me also with  proper code. 

    i am also using C#.

    please its urgent .

    riteshjaiswal1990@gmail.com   - this is my email ID . 

    thanks in advance.

     

     

     

     


    Laxman

    Wednesday, January 25, 2012 6:45 AM
  • Dear Ritesh,

     

    I could not get any solution to read images with text from PDF or scanned documents. if you really want to short out the issue, you have to go for third parties library's which can solve your issue. MODI does not support images.

     

     

    Thanks

     

    Laxman

     


    Laxman
    Monday, January 30, 2012 5:50 AM
  • Dear Laxman ,

    Can u tell me that Third party tools name . please 

    I needed that urgently.

    and i am passing a url -  http://www.onlineocr.net/Default.aspx  ,  just go though it . u will find what we exactly want but they are not providing the source code .

     

    please laxman tell me that third party tool name . Its urgent for me.

    thnaks in advance.

    Monday, January 30, 2012 11:24 AM