locked
Removing certain spaces in a text RRS feed

  • Question

  • Hello,
    In my program, I get an rtf file (which I load through a Rich-TextBox), and need to format it a bit: I need to remove certain spaces, from certain lines.
    All of those lines are in this format: [*#] or [*] (where * is a character or a string, and # is a number). These tags appear in separate lines, and I need to remove any white space from these lines.
    I heard there is a way using Regex, but since I've never used it I have no idea how to do so...

    Any idea?
    Thanks,
    Ofir.
    Monday, November 9, 2009 12:08 PM

Answers

  • You are right, but Regex is not really liked by most developers, but there is a special forum for the regular expression

    http://social.msdn.microsoft.com/Forums/en-US/regexp/threads


    Success
    Cor
    • Marked as answer by YiChun Chen Thursday, November 12, 2009 2:32 AM
    Monday, November 9, 2009 1:36 PM
  • Ah! Sorry, I didn’t realize that the digit was optional. This revised code will solve that situation too!

    For Each line In Me.RichTextBox1.Lines
        If Regex.IsMatch(line, "\[[a-z]\d?\]") Then
            Me.RichTextBox1.Text = Me.RichTextBox1.Text.Replace(line, line.Replace(" ", ""))
        End If
    Next
    • Proposed as answer by Derek Belanger Wednesday, November 11, 2009 1:14 PM
    • Edited by Derek Belanger Wednesday, November 11, 2009 1:16 PM gramma
    • Unproposed as answer by ofireps Wednesday, November 11, 2009 7:31 PM
    • Marked as answer by ofireps Wednesday, November 11, 2009 7:34 PM
    Wednesday, November 11, 2009 1:13 PM
  • Derek, thanks for the reply. It does the work for plain text, but when the rtf is formatted, it loses all its formation (or objects) because of it, and I do not want that to happen...

    Edit:
    I found an answer. This is what I did finally, and it works:
                For Each line In ToSplit.Lines
                    If Regex.IsMatch(line, "\[[a-z]\d?\]") Then
                        ToSplit.Find(line)
                        ToSplit.SelectedText = line.Replace(" ", "")
                    End If
                Next

    Thanks again,
    Ofir.
    • Marked as answer by ofireps Wednesday, November 11, 2009 7:35 PM
    Wednesday, November 11, 2009 7:32 PM

All replies

  • You are right, but Regex is not really liked by most developers, but there is a special forum for the regular expression

    http://social.msdn.microsoft.com/Forums/en-US/regexp/threads


    Success
    Cor
    • Marked as answer by YiChun Chen Thursday, November 12, 2009 2:32 AM
    Monday, November 9, 2009 1:36 PM
  • Hi there,

    There is a method in the String Class called replace. So in your textbox you can use the following:

    textbox1.Text.Replace("Old string value", "string value to replace with")

    The Replace method either takes a string value for input/output or a char input/output.

    Hth,

    Stuart B

    Monday, November 9, 2009 2:00 PM
  • Do you have any other idea other than Regex?


    the replace won't help that much since I do not know if there are spaces before or after the tags, and how many are...
    Monday, November 9, 2009 4:40 PM
  • Ok. No problem! I'm pretty good at strings and can probably help you out - but we need more information.

    1. Are you reading the contents of a file into the text property of a RichTextBox. 
    2. It would be very helpful to see a sample from your data file before and after it has been modified the way you intend.

    Cool?

    Monday, November 9, 2009 6:47 PM
  • I'm reading the file using the RichTextBox's LoadFile() function.
    The format:
    some-text
    ...
      [b1]
    more text
    ...
    [q1]  
    ....
    [a]

    the b,q tags are numbered sequently starting on 1, the as have no numbers (I know how much there are of the q and a tags, and can find them)..
    I tried this:
               ToSplit.Find("[b1]")
    
                'Clear spaces in the line
                If ToSplit.GetLineFromCharIndex(ToSplit.GetFirstCharIndexOfCurrentLine()) <= UBound(ToSplit.Lines) Then
                    ToSplit.Lines(ToSplit.GetLineFromCharIndex(ToSplit.GetFirstCharIndexOfCurrentLine())) = _
                        Replace(ToSplit.Lines(ToSplit.GetLineFromCharIndex(ToSplit.GetFirstCharIndexOfCurrentLine())), " ", "")
                End If
    but it didn't work.. For some reason it didn't get the line number correctly..
    Another important issue is that it must keep the text formatting as it was.

    Thanks again,
    Ofir.
    Monday, November 9, 2009 8:20 PM
  • ok so as i understand you have file that has text like:

    [a1] aaaaa aaaa
    bbbbbbb
    cccccc
    [d2] ddd ddd dddd
    eeeeeee

    and you need to remove spaces from lines "a" and "d"? If this is correct that this code should do it:

    Public Class Form1

        Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
            Me.RichTextBox1.LoadFile("file.rtf")

            RemoveSpaces(Me.RichTextBox1)
        End Sub

        Public Sub RemoveSpaces(ByVal rtb As RichTextBox)

            Dim lines() As String = rtb.Lines

            For i As Integer = 0 To lines.Count - 1
                If lines(i).Contains("[a1]") Or lines(i).Contains("[d2]") Then
                    lines(i) = lines(i).Replace(" ", "")
                End If
            Next

            rtb.Lines = lines

        End Sub
    End Class


    Few drawbacks for this approach though:

    formatting is screwed
    and you have to put in if condition every possible tag

    other than that works like a charm :)
     

    Cheers!
    Monday, November 9, 2009 10:08 PM
  • Well, I do not know how many tags there will be from each type, so writing ifs for all the options is not an option...

    Any other ideas?
    Tuesday, November 10, 2009 4:45 AM
  • Offreps,

    I mostly try to avoid regular expressions, and if the file is then very small I use the split (because that solution is awfull slow),  if possible with 4 lines of code the instr of the indexof, but which you have described I would do with Regex, only Regex is something which needs mostly a bunch of testing (fine tuning) and that is something at least I don't want to do for you.






    Success
    Cor
    Tuesday, November 10, 2009 4:51 AM
  • So can you simply say:

    Trim all leading and traling spaces from the line. Then, if the line startswith "[" and endswith "]", remove all whitespace in the line

    If that's what you need to do, then it can be done without regex.

    You would use the Trim function to delete leading and trailing spaces, then StartsWith and EndsWith functions to detect that it is a line that needs the spaces removed, and the Replace function to replace all spaces with nothing.
    Tuesday, November 10, 2009 5:15 AM
  • Hi there, A way to do that is like the following: 


     Dim str As String = "[d1] Example line of text" 
     Dim index As Integer = 0 
     If str.StartsWith("[") Then
     str = str.Remove(0, 1) 
     index = str.IndexOf("]") 
     str = str.Remove(0, index + 1) 
     str.Trim()
     str = str.Replace(" ", "--") 'I had to add the second remove in because the function was coming up with --Example--line--of--text
     str = str.Remove(0, 2) 
     MsgBox(str) 
     End 


    If I hope this answers your question, 

     Stu
    • Edited by Stuart89 Tuesday, November 10, 2009 9:34 AM formatting corrected
    Tuesday, November 10, 2009 9:32 AM
  • As you suggested, I tried this:
                'Clear extra spaces from tag lines
                For j = 0 To UBound(ToSplit.Lines)
                    If ToSplit.Lines(j).Trim(" ").StartsWith("[") And ToSplit.Lines(j).Trim(" ").EndsWith("]") Then
                        ToSplit.Lines(j) = ToSplit.Lines(j).Trim(" ")
                    End If
                Next
    
    But for some reason the line 
                        ToSplit.Lines(j) = ToSplit.Lines(j).Trim(" ")
    does not change any line... The condition gets to this line, and the trimmed string is good, but for some reason it does not replace the original...
    Tuesday, November 10, 2009 8:11 PM
  • The Trim() is only needed to make sure the StartsWith and EndsWith tests are testing the first and last non-space characters.  The actual processing of the line should use the Replace function:

    ToSplit.Lines(j) = ToSplit.Lines(j).Replace(" ", "")

    • Proposed as answer by Shariq Ayaz Wednesday, November 11, 2009 7:57 PM
    Tuesday, November 10, 2009 8:35 PM
  • It doesn't work either... Still, the actual line isn't changed..
    Tuesday, November 10, 2009 8:56 PM
  • What sort of thing is ToSplit?  If it's a text box then the Lines array is read only by default. 

    The array you are processing should be an array of strings.  You may need to copy the lines of the RTB to an array, or just use the string split method on the RTB text to create the array of strings.  Then use Join to re-create the textbox text fromteh array.

        Dim tempArray() as String = toSplit.Lines
       
    For j = 0 To UBound(ToSplit.Lines)
           
    If temparray(j).Trim(" ").StartsWith("[") And temparray(j).Trim(" ").EndsWith("]") Then
                temparray(j) = temparray(j).Replace(
    " ", “”)
           
    End If
       
    Next
        toSplit.Text = String.Join(vbCrLf, temparray)

     

    • Proposed as answer by Shariq Ayaz Wednesday, November 11, 2009 7:57 PM
    • Unproposed as answer by ofireps Wednesday, November 11, 2009 8:40 PM
    Tuesday, November 10, 2009 9:15 PM
  • You're just going to have to treat rtf as rtf, it isn't text.  Do you mean actual lines or word wrapped lines?
    Tuesday, November 10, 2009 9:47 PM
  • Here's how I would solve your problem.

    For Each line In Me.RichTextBox1.Lines
       If Regex.IsMatch(line, "\[[a-z]\d\]") Then
          Me.RichTextBox1.Text = Me.RichTextBox1.Text.Replace(line, line.Replace(" ", ""))
        End If
    Next
    • Proposed as answer by Derek Belanger Tuesday, November 10, 2009 10:00 PM
    • Unproposed as answer by ofireps Wednesday, November 11, 2009 5:01 AM
    Tuesday, November 10, 2009 10:00 PM
  • Derek, It partially works - it doesn't handle the situation where there is no numbers in the tag, just a letter.

    John, do you have any idea how I should do that?
    Wednesday, November 11, 2009 5:02 AM
  • Ah! Sorry, I didn’t realize that the digit was optional. This revised code will solve that situation too!

    For Each line In Me.RichTextBox1.Lines
        If Regex.IsMatch(line, "\[[a-z]\d?\]") Then
            Me.RichTextBox1.Text = Me.RichTextBox1.Text.Replace(line, line.Replace(" ", ""))
        End If
    Next
    • Proposed as answer by Derek Belanger Wednesday, November 11, 2009 1:14 PM
    • Edited by Derek Belanger Wednesday, November 11, 2009 1:16 PM gramma
    • Unproposed as answer by ofireps Wednesday, November 11, 2009 7:31 PM
    • Marked as answer by ofireps Wednesday, November 11, 2009 7:34 PM
    Wednesday, November 11, 2009 1:13 PM
  • Derek, thanks for the reply. It does the work for plain text, but when the rtf is formatted, it loses all its formation (or objects) because of it, and I do not want that to happen...

    Edit:
    I found an answer. This is what I did finally, and it works:
                For Each line In ToSplit.Lines
                    If Regex.IsMatch(line, "\[[a-z]\d?\]") Then
                        ToSplit.Find(line)
                        ToSplit.SelectedText = line.Replace(" ", "")
                    End If
                Next

    Thanks again,
    Ofir.
    • Marked as answer by ofireps Wednesday, November 11, 2009 7:35 PM
    Wednesday, November 11, 2009 7:32 PM