none
Makro help? — extracting a certain parts out of many documents. RRS feed

  • Question

  • I know I ask for much but does anyone have a Makro for me with that I can select a certain parts of a text and extract it into a new document (with the same name) and that for a bunch of documents?

     

    I have several hundred documents and need to extract a part out of it from each and every one of them. The part is always framed by a uniquely formatted headline and end with a certain set of letters.

    I now have to extract these parts.

     

    Can anyone provide me with a makro for it? I tried it myself but it just won’t work :(

     

    Many thanks in advance.

     

    S

    Monday, August 3, 2015 2:23 PM

Answers

  • You could use a macro like the following. This macro allows you to select both the folder you want to extract the data from and any other folder for the output. As coded, the macro assumes your 'uniquely formatted headline' is text you want to find. As such, it is represented in the code by:

              .Format = False
              .Text = "uniquely formatted headline text"

    If you want to perform a find based on a particular Style instead, you could replace that code with:

              .Format = True
              .Style = "Style Name"
    where 'Style Name' is the name of the Style concerned.

    Similarly, as per your specification, the end of the range to be extracted is defined by:

                .Text = "certain set of letters"

    Sub ExportData()
    Application.ScreenUpdating = False
    Dim strInFolder As String, strOutFolder As String
    Dim strFile As String, strDocNm As String
    Dim wdDocSrc As Document, wdDocTgt As Document, Rng As Range
    strDocNm = ActiveDocument.FullName
    strOutFolder = GetFolder("Choose the INPUT folder")
    If strOutFolder = "" Then Exit Sub
    strOutFolder = GetFolder("Choose the OUTPUT folder")
    If strOutFolder = "" Then Exit Sub
    If strInFolder = strOutFolder Then
      MsgBox "You cannot use the input folder for the output", vbExclamation
      Exit Sub
    End If
    strFile = Dir(strInFolder & "\*.doc", vbNormal)
    While strFile <> ""
      If strInFolder & "\" & strFile <> strDocNm Then
        Set wdDocSrc = Documents.Open(FileName:=strInFolder & "\" & strFile, _
          AddToRecentFiles:=False, ReadOnly:=True, Visible:=False)
        Set Rng = Nothing
        With wdDocSrc
          With .Range
            With .Find
              .ClearFormatting
              .Format = False
              .Text = "uniquely formatted headline text"
              .Forward = True
              .Wrap = wdFindStop
              .MatchCase = True
              .MatchWholeWord = True
              .MatchWildcards = False
              .MatchSoundsLike = False
              .MatchAllWordForms = False
              .Execute
            End With
            If .Find.Found = True Then
              Set Rng = .Duplicate
              .Start = .Duplicate.End
              With .Find
                .ClearFormatting
                .Text = "certain set of letters"
                .Format = False
                .Forward = True
                .Wrap = wdFindStop
                .MatchCase = True
                .MatchWholeWord = True
                .MatchWildcards = False
                .MatchSoundsLike = False
                .MatchAllWordForms = False
                .Execute
              End With
              If .Find.Found = True Then
                Rng.End = .Duplicate.Paragraphs.Last.Range.End
                Set wdDocTgt = Documents.Add
                With wdDocTgt
                  .Range.FormattedText = Rng.FormattedText
                  .Characters.Last.Previous = vbNullString
                  .SaveAs2 FileName:=strOutFolder & "\" & wdDocSrc.Name, _
                    Fileformat:=wdDocSrc.SaveFormat, AddToRecentFiles:=False
                  .Close SaveChanges:=False
                End With
              End If
            End If
          End With
          .Close SaveChanges:=False
        End With
      End If
      strFile = Dir()
    Wend
    Set wdDocSrc = Nothing: Set wdDocTgt = Nothing: Set Rng = Nothing
    Application.ScreenUpdating = True
    End Sub

    Function GetFolder(StrMsg As String) As String
    Dim oFolder As Object
    GetFolder = ""
    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, StrMsg, 0)
    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
    Set oFolder = Nothing
    End Function


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Friday, August 7, 2015 6:25 AM

All replies

  • Hi S,

    Please provide the code you're using now.

    This is the forum to discuss questions and feedback for Office 2010 general discussions, I'll move your question to the MSDN forum for Word

    https://social.msdn.microsoft.com/Forums/en-US/home?category=officedev&filter=alllanguages

    The reason why we recommend posting appropriately is you will get the most qualified pool of respondents, and other partners who read the forums regularly can either share their knowledge or learn from your interaction with us. Thank you for your understanding.

    Regards,

    Emi Zhang
    TechNet Community Support


    It's recommended to download and install Configuration Analyzer Tool (OffCAT), which is developed by Microsoft Support teams. Once the tool is installed, you can run it at any time to scan for hundreds of known issues in Office programs. Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com.

    Tuesday, August 4, 2015 5:14 AM
  • Hi Odradek

    I see a number of things we'd need to know more about:

    1. How to know which documents need to be processed?

    2. Does each document processed need to create another, new document? Or should the information from the bunch of documents all end up in the same one?

    3. It's not possible to give two documents exactly the same name (including the file path). Something needs to be different, such as saving to a different folder or changing something in the file name...

    4. What, exactly is the formatting at the one end and the set of characters at the other? Any chance the formatting has been applied by a style?

    Theoretically, Word's Range.Find functionality can be used with a pair of ranges, since Find can also search formatting. Once the Range is found, then you can use the Range.FormattedText property to copy the information to another document.


    Cindy Meister, VSTO/Word MVP, my blog

    Tuesday, August 4, 2015 12:39 PM
    Moderator
  • You could use a macro like the following. This macro allows you to select both the folder you want to extract the data from and any other folder for the output. As coded, the macro assumes your 'uniquely formatted headline' is text you want to find. As such, it is represented in the code by:

              .Format = False
              .Text = "uniquely formatted headline text"

    If you want to perform a find based on a particular Style instead, you could replace that code with:

              .Format = True
              .Style = "Style Name"
    where 'Style Name' is the name of the Style concerned.

    Similarly, as per your specification, the end of the range to be extracted is defined by:

                .Text = "certain set of letters"

    Sub ExportData()
    Application.ScreenUpdating = False
    Dim strInFolder As String, strOutFolder As String
    Dim strFile As String, strDocNm As String
    Dim wdDocSrc As Document, wdDocTgt As Document, Rng As Range
    strDocNm = ActiveDocument.FullName
    strOutFolder = GetFolder("Choose the INPUT folder")
    If strOutFolder = "" Then Exit Sub
    strOutFolder = GetFolder("Choose the OUTPUT folder")
    If strOutFolder = "" Then Exit Sub
    If strInFolder = strOutFolder Then
      MsgBox "You cannot use the input folder for the output", vbExclamation
      Exit Sub
    End If
    strFile = Dir(strInFolder & "\*.doc", vbNormal)
    While strFile <> ""
      If strInFolder & "\" & strFile <> strDocNm Then
        Set wdDocSrc = Documents.Open(FileName:=strInFolder & "\" & strFile, _
          AddToRecentFiles:=False, ReadOnly:=True, Visible:=False)
        Set Rng = Nothing
        With wdDocSrc
          With .Range
            With .Find
              .ClearFormatting
              .Format = False
              .Text = "uniquely formatted headline text"
              .Forward = True
              .Wrap = wdFindStop
              .MatchCase = True
              .MatchWholeWord = True
              .MatchWildcards = False
              .MatchSoundsLike = False
              .MatchAllWordForms = False
              .Execute
            End With
            If .Find.Found = True Then
              Set Rng = .Duplicate
              .Start = .Duplicate.End
              With .Find
                .ClearFormatting
                .Text = "certain set of letters"
                .Format = False
                .Forward = True
                .Wrap = wdFindStop
                .MatchCase = True
                .MatchWholeWord = True
                .MatchWildcards = False
                .MatchSoundsLike = False
                .MatchAllWordForms = False
                .Execute
              End With
              If .Find.Found = True Then
                Rng.End = .Duplicate.Paragraphs.Last.Range.End
                Set wdDocTgt = Documents.Add
                With wdDocTgt
                  .Range.FormattedText = Rng.FormattedText
                  .Characters.Last.Previous = vbNullString
                  .SaveAs2 FileName:=strOutFolder & "\" & wdDocSrc.Name, _
                    Fileformat:=wdDocSrc.SaveFormat, AddToRecentFiles:=False
                  .Close SaveChanges:=False
                End With
              End If
            End If
          End With
          .Close SaveChanges:=False
        End With
      End If
      strFile = Dir()
    Wend
    Set wdDocSrc = Nothing: Set wdDocTgt = Nothing: Set Rng = Nothing
    Application.ScreenUpdating = True
    End Sub

    Function GetFolder(StrMsg As String) As String
    Dim oFolder As Object
    GetFolder = ""
    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, StrMsg, 0)
    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
    Set oFolder = Nothing
    End Function


    Cheers
    Paul Edstein
    [MS MVP - Word]

    Friday, August 7, 2015 6:25 AM