none
Splitting lines in Microsoft Word for captioning software RRS feed

  • Question

  • Hello all,

    I have been working on a project that involves weekly processing of user-submitted transcriptions of YouTube videos, of the type:

    BOB: Hi there, I'm saying words that are being transcribed by somebody on the internet. I am saying all sorts of words and they are transcribing them into one big paragraph like this one right here.

    ALICE: That's nice, Bob.

    In order to convert these transcriptions into caption files, the software we use requires that line-breaks within a caption are indicated with a single carriage return, whereas breaks between captions are indicated with a double carriage return. That much is easy enough to automate in VBA. What I am trying to automate now is the finer-scale changes that are made manually for each of these 250-page documents every week:

    1. Line lengths must be less than 50 characters.
    2. Each caption can have at most two (50-character) lines.
    3. If different people are speaking, the captions must be separated.
    4. Line-breaks should not split words.
    5. If a punctuation mark is fewer than 10 characters from the end of the caption, the punctuation mark should constitute the end of the caption.

    So the above example would convert to:

    BOB: Hi there, I'm saying words that are being

    transcribed by somebody on the internet.

    (two returns)

    I am saying all sorts of words and they are

    transcribing them into one big paragraph like this

    (two returns)

    one right here.

    (two returns)

    ALICE: That's nice, Bob.

    What I am asking - and I'm aware that this is a pretty tall order - is whether there is a way to automate this process through a macro with the steps delineated above?

    Many, many thanks for any suggestions you can provide.


    • Edited by mesomacro Monday, March 6, 2017 7:43 PM
    Monday, March 6, 2017 7:42 PM

Answers

  • Try this macro, which will work on the active document, and create a new document with the text split out.

    Sub TestParse()
        Dim parT As Paragraph
        Dim docT As Document
        Dim docS As Document
        Dim strP As String
        Dim iSpace As Integer
        Dim iLine As Integer

        
        Set docS = ActiveDocument
        Set docT = Documents.Add
        Selection.EndKey Unit:=wdStory
        
        For Each parT In docS.Paragraphs
            strP = parT.Range.Text
            iSpace = 0
            iLine = 1
            If Len(strP) > 50 Then
                While Len(strP) > 0
                    iSpace = InStrRev(strP, " ", 51 - iSpace)
                    If InStrRev(Left(strP, iSpace), ".") >= 40 Then
                        iSpace = InStrRev(Left(strP, iSpace), ".")
                    ElseIf InStrRev(Left(strP, iSpace), ",") >= 40 Then
                        iSpace = InStrRev(Left(strP, iSpace), ",")
                    End If
                    
                    If iSpace = 0 Then
                        iSpace = Len(strP)
                    End If
                    
                    Selection.TypeText Text:=Trim(Left(strP, iSpace)) & Chr(10)
                    Selection.EndKey Unit:=wdStory
                    
                    If iSpace < Len(strP) Then
                        strP = Mid(strP, iSpace + 1, Len(strP))
                        iLine = iLine + 1
                        iSpace = 0
                    Else
                        strP = ""
                        Selection.TypeText Text:=Chr(10)
                        Selection.EndKey Unit:=wdStory
                    End If
                    
                    If iLine = 3 Then
                        Selection.TypeText Text:=Chr(10)
                        Selection.EndKey Unit:=wdStory
                        iLine = 1
                    End If
                Wend
            Else
                Selection.TypeText Text:=strP & Chr(10) & Chr(10)
                Selection.EndKey Unit:=wdStory
            End If

        Next parT

    End Sub

    • Marked as answer by mesomacro Tuesday, March 7, 2017 6:12 AM
    Tuesday, March 7, 2017 3:01 AM

All replies

  • If the first complete sentence were longer, would you want:

    BOB: Hi there, I'm saying words that are being

    transcribed by somebody on the internet but

    (two returns)

    the first sentence is longer than 100 characters.

    (two returns)

    I am saying all sorts of words and they are

    transcribing them into one big paragraph like this

    (two returns)

    one right here.

    (two returns)

    ALICE: That's nice, Bob.

    OR would you want this?

    BOB: Hi there, I'm saying words that are being

    transcribed by somebody on the internet but

    (two returns)

    the first sentence is longer than 100 characters.

    I am saying all sorts of words and they are

    (two returns)

    transcribing them into one big paragraph like this

    one right here.

    (two returns)

    ALICE: That's nice, Bob.

    Monday, March 6, 2017 9:32 PM
  • The latter would be just fine! Thank you for clarifying.
    Monday, March 6, 2017 11:18 PM
  • Try this macro, which will work on the active document, and create a new document with the text split out.

    Sub TestParse()
        Dim parT As Paragraph
        Dim docT As Document
        Dim docS As Document
        Dim strP As String
        Dim iSpace As Integer
        Dim iLine As Integer

        
        Set docS = ActiveDocument
        Set docT = Documents.Add
        Selection.EndKey Unit:=wdStory
        
        For Each parT In docS.Paragraphs
            strP = parT.Range.Text
            iSpace = 0
            iLine = 1
            If Len(strP) > 50 Then
                While Len(strP) > 0
                    iSpace = InStrRev(strP, " ", 51 - iSpace)
                    If InStrRev(Left(strP, iSpace), ".") >= 40 Then
                        iSpace = InStrRev(Left(strP, iSpace), ".")
                    ElseIf InStrRev(Left(strP, iSpace), ",") >= 40 Then
                        iSpace = InStrRev(Left(strP, iSpace), ",")
                    End If
                    
                    If iSpace = 0 Then
                        iSpace = Len(strP)
                    End If
                    
                    Selection.TypeText Text:=Trim(Left(strP, iSpace)) & Chr(10)
                    Selection.EndKey Unit:=wdStory
                    
                    If iSpace < Len(strP) Then
                        strP = Mid(strP, iSpace + 1, Len(strP))
                        iLine = iLine + 1
                        iSpace = 0
                    Else
                        strP = ""
                        Selection.TypeText Text:=Chr(10)
                        Selection.EndKey Unit:=wdStory
                    End If
                    
                    If iLine = 3 Then
                        Selection.TypeText Text:=Chr(10)
                        Selection.EndKey Unit:=wdStory
                        iLine = 1
                    End If
                Wend
            Else
                Selection.TypeText Text:=strP & Chr(10) & Chr(10)
                Selection.EndKey Unit:=wdStory
            End If

        Next parT

    End Sub

    • Marked as answer by mesomacro Tuesday, March 7, 2017 6:12 AM
    Tuesday, March 7, 2017 3:01 AM
  • Absolutely wonderful! Thank you so much. I spend all my programming time messing with arrays in Matlab and R, so this kind of manipulation has been a little outside my realm of expertise. This is really clear and easy to modify as needed.

    Thank you for such a prompt response!

    Tuesday, March 7, 2017 6:14 AM