none
MODI OCR - line break RRS feed

  • Question

  • Hello,
    I'm trying to use the MODI component in a vb.net project ..

    this is the code I use and running.

    Dim md As New MODI.Document()
            md.Create("c:\test\1.tiff")
    
    
            md.OCR(MODI.MiLANGUAGES.miLANG_ITALIAN, True, True)
    
    
            Dim image As MODI.Image = DirectCast(md.Images(0), MODI.Image)
            Dim layout As MODI.Layout = image.Layout
            Dim out As String = ""
            For j As Integer = 0 To layout.Words.Count - 1
                Dim word As MODI.Word = DirectCast(layout.Words(j), MODI.Word)
                out += " " & word.Text
            Next
            TextBox1.Text = (out)
    I get the text in the box that I find all in the image below, without the jump line in the document.

    how can I get the same layout of the image?

    thanks


    Friday, January 4, 2013 3:34 PM

All replies

  • another question:you can do the ocr only a part of the image by setting the coordinates of the rectangle?

    thanks

    Friday, January 4, 2013 9:00 PM
  • Hi Androita,

    Thanks for posting in the MSDN Forum.

    I will involve some experts into your thread to see whether they can help you. There might be some time delay, appreciate for your patience.

    Have a good day,

    Tom


    Tom Xu [MSFT]
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Monday, January 7, 2013 7:37 AM
    Moderator
  • Hello Tom,
    thanks for the reply, do not worry look happy for your response.

    thanks

    Hello
    Tuesday, January 8, 2013 8:40 AM
  • Hello Androita,

    What do you mean by "how can I get the same layout of the image?" Could you please explain the question in detail?

    Thanks,

    Sreerenj G Nair

    Tuesday, January 8, 2013 9:24 PM
  • hello Sreerenj,

    for example I have an image the same layout:

    -----------------

    Hi,

    How are you?

    Bye

    ----------------

    When I do OCR the result is: "Hi, How are you? Bye"

    While I would like to get:

    Hi,

    How are you?

    Bye

    I do not get the jump line

    Thanks

    Wednesday, January 9, 2013 8:12 AM
  • Hello Androita,

    To get the similar layout, you should use Layout.Text instead of iterating through each words. This will take care of the line feeds. However, you may not be getting the tab spaces as it is. Also, the extra line feeds may be omitted by the OCR.

    Thanks,

    Sreerenj G Nair.

    Friday, January 11, 2013 12:39 AM
  • Hello Sreerenj,

    Thanks for the reply,
    Can I have an example of how to use layout.text as vb.net code?

    And  it's possible to do OCR setting the coordinates of a rectangle?

    thanks

    Androita

    Friday, January 11, 2013 10:49 AM
  • hello,

    hi try layout.tex and work very good!! :)

            Dim strFindText As String = ""
            Dim Doc1 As MODI.Document
    
            Doc1 = New MODI.Document
            Doc1.Create("C:\test\5.tif")
            Doc1.OCR()
            Doc1.Save()
    
            For imageCounter As Integer = 0 To (Doc1.Images.Count - 1)
                strFindText &= Doc1.Images(imageCounter).Layout.Text
            Next
    
            TextBox1.Text = strFindText

    Now I have just last question...

    it's possible to do OCR setting the coordinates of a rectangle?

    Thanks

    Bye

    Friday, January 11, 2013 9:35 PM
  • Hello Androita,

    It is not directly possible to do OCR by setting the coordinates of a rectangle. This is because MODI gets the image as a whole and there is no way to set the coordinates of the image using MODI.

    The workaround would be to create a new image by setting the coordinates. Using System.Drawing classes you can resize/crop the image and pass the new image to MODI.

    The following article will give an idea on croping the image using System.Drawing classes.

    http://www.switchonthecode.com/tutorials/csharp-tutorial-image-editing-saving-cropping-and-resizing

    PS: The article is in C# and you should convert it to VB.NET.

    Thanks,

    Sreerenj G Nair

    Wednesday, January 16, 2013 10:34 PM
  • Hello Sreerenj,

    thanks for your answer!

    Ok,

    I thought it was not possible to do the OCR setting the coordinates, but I had already tried to use the class drawing to crop the image.

    Now I have all documents for finish my program!

    Thanks again!

    Bye

    Androita

    Thursday, January 17, 2013 8:33 AM
  • Hi again,

    I have 1 problem...

    after executing the following code

     Dim strFindText As String = ""
    Dim Doc1 As MODI.Document
    
    Doc1 = New MODI.Document
    Doc1.Create("C:\test\5.tif")
    Doc1.OCR()
    Doc1.Save()
    
    For imageCounter As Integer = 0 To (Doc1.Images.Count - 1)
    strFindText&=Doc1.Images(imageCounter).Layout.Text
    Next
    
    TextBox1.Text = strFindText

    if I try to erase the image 5.tiff me error.
    says that the image is being used by a program and you can not delete.

    I try to write Doc1.Close, but don't work.

    after performingOCR, you can release  the imagesothat it can be eliminated?

    thanks

    Androita

    Saturday, February 16, 2013 12:26 PM
  • Hi,

    nobody can help me??

    Thanks

    Androita

    Saturday, February 23, 2013 10:37 AM
  • I don't know why you cannot  delete 5.tiff, sorry.

    If you don't need the text saved with the image, you may suppress the line "Doc1.Save" and see if you can then delete the file.

    May be there is some permissions issue...

    EDITED:

    Now I see that in VBA it exists a "Doc1.Close"; try it


    • Edited by rodecaterra Thursday, February 28, 2013 12:03 PM
    Thursday, February 28, 2013 11:43 AM
  • Hello Androita,

    It seems that the even after you call the Doc1.Close() method, a ghost object still exists in the memory which holds the handle of the file.

    To workaround this, you may try the following code.

    Doc1 = Nothing
    Marshal.ReleaseComObject(Doc1)

    If the above code didn't work, add the following code to collect the Garbage using the code given below:

    GC.WaitForPendingFinalizers()
    GC.Collect()
    GC.Collect() ' Call second time

    Let me know if this works,

    Thanks,

    Sreerenj G Nair

    Thursday, February 28, 2013 6:19 PM
  • Hi Sreerenj,

    Thanks for help me.

    1)if i write
    Doc1 = Nothing
    Marshal.ReleaseComObject(Doc1)
    don't work, go error "Object reference not set to an instance of an object."

    2)But if i write
    Marshal.ReleaseComObject(Doc1)
    Doc1 = Nothing

    WORDK VERY GOOD :)))

    2)and if i write

    GC.WaitForPendingFinalizers()
    GC.Collect()
    GC.Collect() ' Call second time

    ALSO WORK VERY GOOD :)))

    WHAT ARE THE DIFFERENCES AND WHICH is BETTER SOLUTION BETWEEN THE EXAMPLE 2 AND 3?

    Thanks

    Androita

    Saturday, March 2, 2013 10:17 AM
  • Hello Androita,

    While typing I misplaced the lines. My bad. The second example is the correct one.

    When Visual Studio .NET calls a COM object from managed code, it automatically creates a Runtime Callable Wrapper (RCW). The RCW marshals calls between the .NET application and the COM object. The RCW keeps a reference count on the COM object. Therefore, if all references have not been released on the RCW, the COM object does not quit.

    The example 2 is best; because calling GC.Collect() explicitly will cause performance delay. However, if the COM server still exists after calling ReleaseComObject method, you can use GC.Collect() after setting the Doc1 to Nothing.

    Thanks,

    Sreerenj G Nair


    Please remember to mark the replies as answers if it answered your question. Also, please click on Vote as Helpful if the post was helpful.

    Monday, March 4, 2013 6:39 PM