none
How to programmatically extract the text of the currently viewed page of an Office.Interop.Word.Document object

    Question

  • The primary purpose of my application is to allow viewing of MS Word documents.

    For project-related reasons, I need to be aware of the text the user is currently looking at regardless of the position of the Selection/Insertion point since the user could navigate through the document without changing the selection (using scroll bars, the mouse wheel ... etc.)

    I've tried several solutions and even some hacks and workarounds. However, I couldn't get it to work.

    Any suggestions?

    Thanks in advance.
    Tuesday, August 19, 2008 3:48 PM

Answers

  • Hello Ahmedmhamid,

     

    I use the following codes to achieve what we want to do. It is written in VB.NET Windows Form Application for illustration.

    Imports Word = Microsoft.Office.Interop.Word

    Imports System.Runtime.InteropServices

     

    Public Class Form1

        Dim app As Word.Application

     

        Partial Public Class NativeMethods

            <System.Runtime.InteropServices.DllImportAttribute("user32.dll", EntryPoint:="FindWindowExW")> _

            Public Shared Function FindWindowExW(<System.Runtime.InteropServices.InAttribute()> ByVal hWndParent As System.IntPtr, <System.Runtime.InteropServices.InAttribute()> ByVal hWndChildAfter As System.IntPtr, <System.Runtime.InteropServices.InAttribute(), System.Runtime.InteropServices.MarshalAsAttribute(System.Runtime.InteropServices.UnmanagedType.LPWStr)> ByVal lpszClass As String, <System.Runtime.InteropServices.InAttribute(), System.Runtime.InteropServices.MarshalAsAttribute(System.Runtime.InteropServices.UnmanagedType.LPWStr)> ByVal lpszWindow As String) As System.IntPtr

            End Function

        End Class

     

        <System.Runtime.InteropServices.StructLayoutAttribute(System.Runtime.InteropServices.LayoutKind.Sequential)> _

        Public Structure tagRECT

            Public left As Integer

            Public top As Integer

            Public right As Integer

            Public bottom As Integer

        End Structure

     

        Partial Public Class NativeMethods

            <System.Runtime.InteropServices.DllImportAttribute("user32.dll", EntryPoint:="GetWindowRect")> _

            Public Shared Function GetWindowRect(<System.Runtime.InteropServices.InAttribute()> ByVal hWnd As System.IntPtr, <System.Runtime.InteropServices.OutAttribute()> ByRef lpRect As tagRECT) As <System.Runtime.InteropServices.MarshalAsAttribute(System.Runtime.InteropServices.UnmanagedType.Bool)> Boolean

            End Function

        End Class

     

        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

            Dim h As IntPtr = NativeMethods.FindWindowExW(New IntPtr(0), New IntPtr(0), "OpusApp", app.ActiveWindow.Caption + " - Microsoft Word")

            h = NativeMethods.FindWindowExW(h, New IntPtr(0), "_WwF", "")

            h = NativeMethods.FindWindowExW(h, New IntPtr(0), "_WwB", app.ActiveDocument.Name)

            h = NativeMethods.FindWindowExW(h, New IntPtr(0), "_WwG", "Microsoft Word Document")

     

            Dim t As tagRECT = New tagRECT

            NativeMethods.GetWindowRect(h, t)

     

            Dim r1 As Word.Range = app.ActiveWindow.RangeFromPoint(t.left, t.top)

            Dim r2 As Word.Range = app.ActiveWindow.RangeFromPoint(t.right, t.bottom)

            Dim r As Word.Range = app.ActiveDocument.Range(r1.Start, r2.Start)

            MessageBox.Show(r.Text)

        End Sub

     

        Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

            app = Marshal.GetActiveObject("Word.Application")

        End Sub

    End Class

     

    The idea is that we can call windows api FindWindowEx to get the Document Window in the Word application. Then calling GetWindowRect() will let us know the location of the document window according to the screen. With the location information, we can call Word.Window.RangeFromPoint to get the range current present to user.

     

     

    Thanks,

    Ji

    Thursday, August 21, 2008 7:56 AM
    Moderator

All replies

  • Hello Ahmedmhamid,

     

    I use the following codes to achieve what we want to do. It is written in VB.NET Windows Form Application for illustration.

    Imports Word = Microsoft.Office.Interop.Word

    Imports System.Runtime.InteropServices

     

    Public Class Form1

        Dim app As Word.Application

     

        Partial Public Class NativeMethods

            <System.Runtime.InteropServices.DllImportAttribute("user32.dll", EntryPoint:="FindWindowExW")> _

            Public Shared Function FindWindowExW(<System.Runtime.InteropServices.InAttribute()> ByVal hWndParent As System.IntPtr, <System.Runtime.InteropServices.InAttribute()> ByVal hWndChildAfter As System.IntPtr, <System.Runtime.InteropServices.InAttribute(), System.Runtime.InteropServices.MarshalAsAttribute(System.Runtime.InteropServices.UnmanagedType.LPWStr)> ByVal lpszClass As String, <System.Runtime.InteropServices.InAttribute(), System.Runtime.InteropServices.MarshalAsAttribute(System.Runtime.InteropServices.UnmanagedType.LPWStr)> ByVal lpszWindow As String) As System.IntPtr

            End Function

        End Class

     

        <System.Runtime.InteropServices.StructLayoutAttribute(System.Runtime.InteropServices.LayoutKind.Sequential)> _

        Public Structure tagRECT

            Public left As Integer

            Public top As Integer

            Public right As Integer

            Public bottom As Integer

        End Structure

     

        Partial Public Class NativeMethods

            <System.Runtime.InteropServices.DllImportAttribute("user32.dll", EntryPoint:="GetWindowRect")> _

            Public Shared Function GetWindowRect(<System.Runtime.InteropServices.InAttribute()> ByVal hWnd As System.IntPtr, <System.Runtime.InteropServices.OutAttribute()> ByRef lpRect As tagRECT) As <System.Runtime.InteropServices.MarshalAsAttribute(System.Runtime.InteropServices.UnmanagedType.Bool)> Boolean

            End Function

        End Class

     

        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

            Dim h As IntPtr = NativeMethods.FindWindowExW(New IntPtr(0), New IntPtr(0), "OpusApp", app.ActiveWindow.Caption + " - Microsoft Word")

            h = NativeMethods.FindWindowExW(h, New IntPtr(0), "_WwF", "")

            h = NativeMethods.FindWindowExW(h, New IntPtr(0), "_WwB", app.ActiveDocument.Name)

            h = NativeMethods.FindWindowExW(h, New IntPtr(0), "_WwG", "Microsoft Word Document")

     

            Dim t As tagRECT = New tagRECT

            NativeMethods.GetWindowRect(h, t)

     

            Dim r1 As Word.Range = app.ActiveWindow.RangeFromPoint(t.left, t.top)

            Dim r2 As Word.Range = app.ActiveWindow.RangeFromPoint(t.right, t.bottom)

            Dim r As Word.Range = app.ActiveDocument.Range(r1.Start, r2.Start)

            MessageBox.Show(r.Text)

        End Sub

     

        Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

            app = Marshal.GetActiveObject("Word.Application")

        End Sub

    End Class

     

    The idea is that we can call windows api FindWindowEx to get the Document Window in the Word application. Then calling GetWindowRect() will let us know the location of the document window according to the screen. With the location information, we can call Word.Window.RangeFromPoint to get the range current present to user.

     

     

    Thanks,

    Ji

    Thursday, August 21, 2008 7:56 AM
    Moderator
  • Hi ,

    As far as I know VSTO do not conver this.How ever I'd like to give you some ideas that may help you to find walk around.

    We could use Document.Bookmarks("\Page") to get the current page where the Insert pointer locate.

    Here's the link.

    http://support.microsoft.com/kb/212555

     

    Or you could use Application.Selection.Goto to go the specific Page.

    Please see the following discussion:

    http://forums.microsoft.com/msdn/ShowPost.aspx?PostID=3751059&SiteID=1

    http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=3654264&SiteID=1

     

    Thanks

    Thursday, August 21, 2008 8:19 AM
  • Thanks for the answer.

    In fact, I've already tried this solution before. It definitely works but there's one problem with it: it requires user interaction, i.e. the user must click a button so that the text he/she is looking at is retrieved. What I was looking for is a way to get the text he/she is viewing without requiring any interaction from him/her (not even a click on a button).

    Is there any way I can know that the user has scrolled to a different part of the document so that I start executing the code you provided to get the text he/she is currently looking at? I could find no scroll events in the Office.Interop.Word object model.

    Thanks again.

    Ahmed
    Thursday, August 21, 2008 11:37 AM
  • Hello Ahmed,

     

    Yes, you are right. No scroll related events exist in the Office Word object model.

     

    Two workarounds here:

     

    1.If you are working on a VSTO project, Addin or document customization, the codes will be executed in the current Word process. Then, we can subclass the scroll control's WndProc function. If we receieved a SBM_SETSCROLLINFO message, we run the above codes to get the text the user is looking at.

     

    2.If you are automating Word via the windows form or console type application where we cannot subclass a window in another process, we need to use the Timer.Tick event. Every one second, we get the text the user is looking at.

     

     

    Thanks,

    Ji

     

    Friday, August 22, 2008 2:42 AM
    Moderator