locked
MODI picture in window form c# RRS feed

  • Question

  • Hello i want to read the text from the picture .i add the picture in the pictureBox

    I add the reference MODI 12.0  and when i click in the picture to put the text in textbox i hav an error .

    ""System.Runtime.InteropServices.COMException'""

    this is my code

    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.IO;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using System.Windows.Forms;
    
    namespace WindowsFormsApplication1
    {
        public partial class Form1 : Form
        {
            public Form1()
            {
                InitializeComponent();
            }
    
            
    
            private void pictureBox1_Click(object sender, EventArgs e)
            {
                MODI.Document md = new MODI.Document();
                md.Create(Convert.ToString(pictureBox1.Image));
                md.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
                MODI.Image image = (MODI.Image)md.Images[0];
                textBox1.Text = image.Layout.Text;
    
    
                
            }
           

    Thursday, July 17, 2014 12:46 PM

Answers

  • Hi Ahmed,

    I think I have a solution for you. As mentioned, I suggest to use another library. If you use Office Document Imaging, you can get in trouble if a user does not have Office or the correct version you developed on. Remember please, yesterday we noticed that e.g. Office Professional Plus 2013 does not ship the Imaging components. So I can't help you yesterday. I think Microsoft wants to get people using OneNote...

    But after that, I looked around in the web and found the Tesseract OCR Engine. It was developed by HP Labs and now from Google. You can find it here: Tesseract OCR 

    After I played a little bit around, I was impressed how easy it is to scan images for text. I think Tesseract provides all you need. When I remember correctly, you want to skip regions of an image when scanning? Or scan specific regions of an image only? Tesseract is able to do that.

    I wrote a little demo app for you. You can clone it from GitHub here: https://github.com/robinsedlaczek/OcrDemoWithTesseract.git The screenshot below shows the demo app. You can select an image with the "..."-button. Then you can scan the whole image or specific regions of the selected image. Just use the buttons behind the "..."-button. Below, on the left the image is shown. On the right side is a textbox where the scanned text is shown.

    To demonstrate scanning regions, I just defined 4 static hard-coded regions of the image. In the screenshot below I scanned the class nodes in the diagram separately (you can find the image in the repository in the "Test Images" folder). So I can show the results per region on the right side.

    Screenshot of demo application

    To use Tesseract, just add the Tesseract Wrapper NuGet package as shown below. Further, you need the Tesseract engine. You can download it from the web or find it in my repository in the "\Ocr\Tesseract - Backup\Tesseract-OCR" folder. the NuGet package will use this runtime to scan images. 

    Tesseract NuGet package

    If you look into the code (MainWindow.ScanImage method), you can see that the engine path must be specified. That is because the wrapper must find the engine.

    var enginePath = ""; /* Path to the Tesseract-OCR folder. */
    
    var api = new TesseractEngine(enginePath, "eng", EngineMode.Default);
    

    Instead of being stressed with different Office versions, you can simply copy the Tesseract engine with your application.

    Ok, if you have any questions or problems with the code, please contact my on any channel! 

    Hope, that helps!


    ---------------------------------- Robin Sedlaczek @ Microsoft Forums

    • Marked as answer by AhmedWP Friday, July 18, 2014 2:04 PM
    Friday, July 18, 2014 10:22 AM

All replies

  • Hello AhmedWP,

    Based on some research, the MODI document seems give me some direction:

    http://msdn.microsoft.com/en-us/library/office/aa167607(v=office.11).aspx

    You can see the the create method required MDI or TIF file http://msdn.microsoft.com/en-us/library/office/aa202763(v=office.11).aspx

    A related case about this format issue is also listed here:

    http://www.dreamincode.net/forums/topic/226151-ocr-using-modi-microsoft-office-document-imaging/

    The error is also: System.Runtime.InterOpServices.COMException

    However this MODI is not part of WinForm technology, if that is not the answer or you have further questions about this issue, you will need to consult on the following forum:

    http://social.msdn.microsoft.com/Forums/office/en-US/home?forum=worddev

    That would be better for this kind of issue.

    PS: Another thing is that MODI is only part of Office 2010 but no longer part of Office 2013 http://answers.microsoft.com/en-us/office/forum/office_2013_release-other_msftoffice_apps/office-2013-and-installing-ocr-for-documenting/ab7078a3-fd67-4199-a722-6a0596b838a0 You may notice that.

    Regards,



    Barry
    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    • Edited by Barry Wang Friday, July 18, 2014 8:38 AM
    Friday, July 18, 2014 8:37 AM
  • Hi Ahmed,

    I think I have a solution for you. As mentioned, I suggest to use another library. If you use Office Document Imaging, you can get in trouble if a user does not have Office or the correct version you developed on. Remember please, yesterday we noticed that e.g. Office Professional Plus 2013 does not ship the Imaging components. So I can't help you yesterday. I think Microsoft wants to get people using OneNote...

    But after that, I looked around in the web and found the Tesseract OCR Engine. It was developed by HP Labs and now from Google. You can find it here: Tesseract OCR 

    After I played a little bit around, I was impressed how easy it is to scan images for text. I think Tesseract provides all you need. When I remember correctly, you want to skip regions of an image when scanning? Or scan specific regions of an image only? Tesseract is able to do that.

    I wrote a little demo app for you. You can clone it from GitHub here: https://github.com/robinsedlaczek/OcrDemoWithTesseract.git The screenshot below shows the demo app. You can select an image with the "..."-button. Then you can scan the whole image or specific regions of the selected image. Just use the buttons behind the "..."-button. Below, on the left the image is shown. On the right side is a textbox where the scanned text is shown.

    To demonstrate scanning regions, I just defined 4 static hard-coded regions of the image. In the screenshot below I scanned the class nodes in the diagram separately (you can find the image in the repository in the "Test Images" folder). So I can show the results per region on the right side.

    Screenshot of demo application

    To use Tesseract, just add the Tesseract Wrapper NuGet package as shown below. Further, you need the Tesseract engine. You can download it from the web or find it in my repository in the "\Ocr\Tesseract - Backup\Tesseract-OCR" folder. the NuGet package will use this runtime to scan images. 

    Tesseract NuGet package

    If you look into the code (MainWindow.ScanImage method), you can see that the engine path must be specified. That is because the wrapper must find the engine.

    var enginePath = ""; /* Path to the Tesseract-OCR folder. */
    
    var api = new TesseractEngine(enginePath, "eng", EngineMode.Default);
    

    Instead of being stressed with different Office versions, you can simply copy the Tesseract engine with your application.

    Ok, if you have any questions or problems with the code, please contact my on any channel! 

    Hope, that helps!


    ---------------------------------- Robin Sedlaczek @ Microsoft Forums

    • Marked as answer by AhmedWP Friday, July 18, 2014 2:04 PM
    Friday, July 18, 2014 10:22 AM