locked
Slow Code RRS feed

  • Question

  • Hi,

    I have finally managed to complete the syntax highlighting code for my text editor, it produces the results below:

    Syntax Highlighting Example

    This is good, but the code is very very slow, for example on larger files it just crashes VB... I can't understand why and so  I'm  posting my code here to see if any more experienced programmers/VB programmers can give me any pointers on why it could be slow and how to optimize it... : - )

    The Code:

     
    Code Block

       Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
            Dim i As Integer
            Dim start_tag As Integer
            Dim end_tag As Integer
            Dim start_count As Boolean
            Dim length As Byte
            For i = 1 To Len(txtmain.Text)
                If Mid(txtmain.Text, i, 1) = "<" Then
                    start_tag = i - 1
                    start_count = True
                End If

                If start_count = True Then
                    length = length + 1
                End If

                If Mid(txtmain.Text, i, 1) = ">" Or Mid(txtmain.Text, i, 1) = "=" Then
                    start_count = False
                    end_tag = i
                    txtmain.SelectionStart = start_tag
                    txtmain.SelectionLength = length
                    txtmain.SelectionColor = Color.Blue

                    txtmain.SelectionLength = 0
                    end_tag = 0
                    start_tag = 0
                    length = 0
                End If
                txtmain.SelectionColor = Color.Black
            Next
            txtmain.SelectionColor = Color.Black
            i = 0

            Dim t As Integer
            Dim start_tag2 As Integer
            Dim start_count2 As Boolean
            Dim length2 As Integer

            '// Purple Properties " to "
            For t = 1 To Len(txtmain.Text)
                If Mid(txtmain.Text, t, 2) = "=" & Chr(34) Then
                    start_tag2 = t - 1
                    start_count2 = True
                End If

                If start_count2 = True Then
                    length2 = length2 + 1
                End If

                If length2 > 2 Then
                    If Mid(txtmain.Text, t, 1) = Chr(34) Then
                        start_count2 = False
                        txtmain.SelectionStart = start_tag2
                        txtmain.SelectionLength = length2
                        txtmain.SelectionColor = Color.Purple

                        txtmain.SelectionLength = 0
                        start_tag2 = 0
                        length2 = 0
                    End If
                End If
                txtmain.SelectionColor = Color.Black
            Next
            txtmain.SelectionColor = Color.Black
        End Sub


    Some quick explanations - start_tag is the position of the "<" tag - length is the length of string between the "<" and the next ">" - end_tag isn't used but I just haven't removed it yet.

    Also, the first routine is for the blue highlighting (< and >) and the second for the purple (=" and ")

    Thanks, Alex.

    Saturday, October 20, 2007 9:57 AM

Answers

  • Hi,

     

    Nice One!!

     

    I do see a problem and I might have an alternative approach for you that might make a lot of your code reduntant (which you'll either love or hate me for).

     

    Think the problem is you looping over each character, twice in this case. Once for the < > and then another for the =. So for large files your going over a very large number of characters twice when really you could just scan the text once and apply both the < > and = colouring.

     

    I'd imagine it might crash because your using integers which have a maximum value of approx 32,000 I think it is, so if a large file has more than this number of chracters then your program will crash. Don't know if that is happening but it's a possibility.

     

    Combine the two loops into one. Or use the following alternative.... Regular Expressions.

     

    These are a way to parse text looking for certain patterns within the text. So for example you can search for patterns of text that start with a < and end with a > or patterns that start with </ and end with >. They are extremely quick but they are a little bit cryptic. Let me give you an example.....

     

    input text = "<hello>World</hello>"

    match = want to match the start and end tag

    patterns

    start tag = "<\w*>"

    end tag = "</\w*>"

     

    Run this code in a new console and you will see the results....

     

    Imports System.Text.RegularExpressions

    Module Module1

     

    Sub Main()

    Dim input As String = "<hello>world</hello>"

    Dim startTag As String = "<\w*>"

    Dim endTag As String = "</\w*>"

     

    Dim startTagMatch As Match = Regex.Match(input, startTag)

    Dim endTagMatch As Match = Regex.Match(input, endTag)

     

    Console.WriteLine("Matched: {0}, Index: {1}, Length: {2}", startTagMatch.Value, startTagMatch.Index, startTagMatch.Length)

    Console.WriteLine("Matched: {0}, Index: {1}, Length: {2}", endTagMatch.Value, endTagMatch.Index, endTagMatch.Length)

    Console.ReadLine()

    End Sub

     

    End Module

     

    Saturday, October 20, 2007 1:41 PM
  • Hi again,

     

    Here is another example on how you can match many tags looping over each one, this is along the same lines as you are doing now. Just let me point out that startTag and endTag are patterns that are matched within the input text, learning how to write your own patterns is the tricky part, but lucky there is a MSDN forum you can get help in if you need to write your own.

     

    Dim input As String = "<html><body><p>Hello World</p></body></html>"

    Dim startTag As String = "<\w*>"

    Dim endTag As String = "</\w*>"

     

    Dim startTagsMatches As MatchCollection = Regex.Matches(input, startTag)

    Dim endTagsMatches As MatchCollection = Regex.Matches(input, endTag)

     

    For Each startTagMatch As Match In startTagsMatches

    Console.WriteLine("Matched: {0}, Index: {1}, Length: {2}", startTagMatch.Value, startTagMatch.Index, startTagMatch.Length)

    Next

     

    For Each endTagMatch As Match In endTagsMatches

    Console.WriteLine("Matched: {0}, Index: {1}, Length: {2}", endTagMatch.Value, endTagMatch.Index, endTagMatch.Length)

    Next

     

    Console.ReadLine()

     

    Saturday, October 20, 2007 1:47 PM
  • Hi,

     

    Glad you took that well enough.

     

    Yeah the way you're colouring the text could also be slowing things done you know, your approach is very methodological. Scan each character of the text and find a character, continue scanning until you find another character, move the claret to the start position of the character, select the length, and continue scanning. So yeah its adding a bit more than if you were to build a colour coded replacement text because you have to deal with moving the position of the claret. How much of a performance hit this would be is difficult to say. The big hit comes from looping each character, twice, maybe more.

     

    My knowledge of RTF is a bit slack and I have no real want to learn RTF codes. So I always tend towards using HTML for formatting. I'm not sure how I'd approach this to be honest.  Part of me is thinking build a custom Web Browser that accepts user input to make things easier on the formatting, while another part of me is thinking colouring text using RTF codes might not be that difficult.

     

    Are you familar with RTF there is a document on MSDN about the codes you can use to format the text but to be honest if you use WordPad and save simple RTF document you can open it in Notepad to see the codes. If you wrapped your text directly with the RTF codes then that would prevent any performance hits you take in your current approach.

     

    Also you should look at the StringBuilder class, when you work with strings your performance can dip because your filling up memory working with strings and this causes garbage collection to being performed a lot. The StringBuilder class is used to stop that from happening, do searches for Immutable Strings and StringBuilders for more information.

     

    From a button click I would take the plain text from the RTF document, fill a StringBuilder, use the replace method of the StringBuilder with the regular expression to add RTF codes, or else use the results of the matches in the code above to parse the text held in the StringBuilder inserting the RTF codes and then I would replace the complete text in the RTF control. Thats from a button click, not sure how I would do it for on the fly colour coding.

     

    When you open your file check the files extension and this will determine when to apply formatting.

     

    Smile

    Saturday, October 20, 2007 3:24 PM

All replies

  • Hi,

     

    Nice One!!

     

    I do see a problem and I might have an alternative approach for you that might make a lot of your code reduntant (which you'll either love or hate me for).

     

    Think the problem is you looping over each character, twice in this case. Once for the < > and then another for the =. So for large files your going over a very large number of characters twice when really you could just scan the text once and apply both the < > and = colouring.

     

    I'd imagine it might crash because your using integers which have a maximum value of approx 32,000 I think it is, so if a large file has more than this number of chracters then your program will crash. Don't know if that is happening but it's a possibility.

     

    Combine the two loops into one. Or use the following alternative.... Regular Expressions.

     

    These are a way to parse text looking for certain patterns within the text. So for example you can search for patterns of text that start with a < and end with a > or patterns that start with </ and end with >. They are extremely quick but they are a little bit cryptic. Let me give you an example.....

     

    input text = "<hello>World</hello>"

    match = want to match the start and end tag

    patterns

    start tag = "<\w*>"

    end tag = "</\w*>"

     

    Run this code in a new console and you will see the results....

     

    Imports System.Text.RegularExpressions

    Module Module1

     

    Sub Main()

    Dim input As String = "<hello>world</hello>"

    Dim startTag As String = "<\w*>"

    Dim endTag As String = "</\w*>"

     

    Dim startTagMatch As Match = Regex.Match(input, startTag)

    Dim endTagMatch As Match = Regex.Match(input, endTag)

     

    Console.WriteLine("Matched: {0}, Index: {1}, Length: {2}", startTagMatch.Value, startTagMatch.Index, startTagMatch.Length)

    Console.WriteLine("Matched: {0}, Index: {1}, Length: {2}", endTagMatch.Value, endTagMatch.Index, endTagMatch.Length)

    Console.ReadLine()

    End Sub

     

    End Module

     

    Saturday, October 20, 2007 1:41 PM
  • Hi again,

     

    Here is another example on how you can match many tags looping over each one, this is along the same lines as you are doing now. Just let me point out that startTag and endTag are patterns that are matched within the input text, learning how to write your own patterns is the tricky part, but lucky there is a MSDN forum you can get help in if you need to write your own.

     

    Dim input As String = "<html><body><p>Hello World</p></body></html>"

    Dim startTag As String = "<\w*>"

    Dim endTag As String = "</\w*>"

     

    Dim startTagsMatches As MatchCollection = Regex.Matches(input, startTag)

    Dim endTagsMatches As MatchCollection = Regex.Matches(input, endTag)

     

    For Each startTagMatch As Match In startTagsMatches

    Console.WriteLine("Matched: {0}, Index: {1}, Length: {2}", startTagMatch.Value, startTagMatch.Index, startTagMatch.Length)

    Next

     

    For Each endTagMatch As Match In endTagsMatches

    Console.WriteLine("Matched: {0}, Index: {1}, Length: {2}", endTagMatch.Value, endTagMatch.Index, endTagMatch.Length)

    Next

     

    Console.ReadLine()

     

    Saturday, October 20, 2007 1:47 PM
  • Wow!

    WOW and WOW! :-) (No i'm not a vista salesperson....) 

    Thankyou, your 2 posts are amazing, I knew that something like this existed (I have been researching syntax highlighting for ages, I have wanted to do it for years, but only recently has my VB knowledge been up to actually completing it!)

    I didn't know VB was powerful enough to use regular expressions, one question however, these regular expressions are amazing but, do you think its the way i'm selecting the text and then colouring it that also could be adding the slowness?

    I was thinking about manipulating the code of an RTF document to do the colouring instead...and using a string replace function etc this however could prove to be even slower....!

    What control would you reccomend to do this colouring basically, or should I just stick with the current RichTextBox? The only problem with using the RichTextBox at the moment is, if any text that isn't plain text is pasted then it ruins the formatting, plus you can open any textfile and it shows the formatting instead of the plain text

    Thankyou for your help anyway, its been a bit of a revelation hehe :-) (No more dreaded CPU eating for/next loops!)

    Thanks, Alex.

    Saturday, October 20, 2007 2:23 PM
  • Hi,

     

    Glad you took that well enough.

     

    Yeah the way you're colouring the text could also be slowing things done you know, your approach is very methodological. Scan each character of the text and find a character, continue scanning until you find another character, move the claret to the start position of the character, select the length, and continue scanning. So yeah its adding a bit more than if you were to build a colour coded replacement text because you have to deal with moving the position of the claret. How much of a performance hit this would be is difficult to say. The big hit comes from looping each character, twice, maybe more.

     

    My knowledge of RTF is a bit slack and I have no real want to learn RTF codes. So I always tend towards using HTML for formatting. I'm not sure how I'd approach this to be honest.  Part of me is thinking build a custom Web Browser that accepts user input to make things easier on the formatting, while another part of me is thinking colouring text using RTF codes might not be that difficult.

     

    Are you familar with RTF there is a document on MSDN about the codes you can use to format the text but to be honest if you use WordPad and save simple RTF document you can open it in Notepad to see the codes. If you wrapped your text directly with the RTF codes then that would prevent any performance hits you take in your current approach.

     

    Also you should look at the StringBuilder class, when you work with strings your performance can dip because your filling up memory working with strings and this causes garbage collection to being performed a lot. The StringBuilder class is used to stop that from happening, do searches for Immutable Strings and StringBuilders for more information.

     

    From a button click I would take the plain text from the RTF document, fill a StringBuilder, use the replace method of the StringBuilder with the regular expression to add RTF codes, or else use the results of the matches in the code above to parse the text held in the StringBuilder inserting the RTF codes and then I would replace the complete text in the RTF control. Thats from a button click, not sure how I would do it for on the fly colour coding.

     

    When you open your file check the files extension and this will determine when to apply formatting.

     

    Smile

    Saturday, October 20, 2007 3:24 PM
  •  

    I'd imagine it might crash because your using integers which have a maximum value of approx 32,000 I think it is,

     

    You're quite a bit out Derek!

     

    Integers are 32 bit in .Net, and can store a maximum value of 2,147,483,647.   (2^31 - 1)

     

    The VB6 Integer, and the .Net Short use 16 bits and store a maximum value of 32,768.  (2^15 - 1)

    Saturday, October 20, 2007 7:00 PM
  • Smile... oh well, pobodys nerfect.

     

    I kind of want to say that only unsigned integers have a max of 2,147,483,647 and that integers have a maximum of half that, but I wouldn't want it to come across as egotistical. 

     

    Thanks jo0ls for keeping me right. I do appreciate it.

    Saturday, October 20, 2007 7:44 PM
  •  

    Hi All,

     

    Derek your code looked nice. I wouldn't use regex out of personal preference but that was nice code.

     

    DbAlex i was looking at your code and though your code suffered from vb6 hangovers. In making the transistion to .Net what I've done is to label any impulse to use Mid and Instr as a retrogression. They are one based and the methods in the String Class are more powerful. If you used them correctly, I don't think you'd have to do any looping at all.

     

    I was just looking at some recent code where I was extracting an IP from an HTML document on the net. It looks like this:

     

    Dim HTML As String = reader.ReadLine.Substring(My.Settings.ExtIPTrimLocation) ' Extract IP string

    m_sExternalIP = HTML.Substring(0, HTML.IndexOf(My.Settings.ExtIPEndTrimChar)) ' from HTML

     

    It's a little different problem from yours, but essentially it strips Html just leaving an IP address. Instead of mid you use .substring and indexof for serching. It's very powerful too.

    Saturday, October 20, 2007 8:21 PM
  • Unsigned integers have a max of 2^32-1.  Signed integers have a max of 2^31-1.

     

    Saturday, October 20, 2007 8:41 PM
  • Hi,

     

    First of all sorry DBAlex and jo0ls for getting the integer data type mixed up, don't know where my head was at. Thanks JohnWein and jo0ls for pointing it out. I don't mind being wrong and really do appreciate your pointing it out.

     

    Thanks for the compliment ReneeC, you can usually tell when someone with a VB background posts some code. The only problem with the string handling methods is the amount of strings that are created because the strings are immutable. I usually judge when to use regex over string methods on the size of the text and how complex the parsing is going to be.

     

    DBAlex since I got the data type completely wrong it got me thinking about what else I may have got wrong in my suggestions so I decided to re-create your application using the regex approach. here it is...

     

    Imports System.Text

    Imports System.Text.RegularExpressions

     

    Public Class Form1

    Private Sub cmdColourCode_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdColourCode.Click

    Me.ColourCode()

    End Sub

     

    Private Sub ColourCode()

    Dim code As String = Me.rtfCode.Text

    'http://regexlib.com/

    Dim tags As String = "</?[a-z][a-z0-9]*[^<>]*>"

    Dim attributes As String = "\s\w*=['|""].*['|""]"

    Dim comments As String = "<!--.*-->"

     

    Me.ColourCode(tags, code, Color.Blue)

    Me.ColourCode(attributes, code, Color.Maroon)

    Me.ColourCode(comments, code, Color.Green)

    End Sub

     

    Private Sub ColourCode(ByVal pattern As String, ByVal code As String, ByVal colour As Color)

    Dim matches As MatchCollection = Regex.Matches(code, pattern, RegexOptions.IgnoreCase)

    For Each match As Match In matches

    Me.rtfCode.SelectionStart = match.Index

    Me.rtfCode.SelectionLength = match.Length

    Me.rtfCode.SelectionColor = colour

    Next

    End Sub

    End Class

     

    Create a RTF text box called rtfCode and a button called cmdColourCode (oops english spelling)

     

    As you can see I decided to stick with editing the text in the RTF control. This is because working with the RTF codes directly is a bit of a nightmare and thats an understatement. See if this has any impact on performance.

     

    Also notice I don't use a StringBuilder which was another incorrect recommendation I made. Just be wary that working with a lot of string handling can potentially fill memory, and if possible use the StringBuilder to prevent it.

    Sunday, October 21, 2007 1:15 PM
  • Hey,

     

    Forgot to reply to this, Thanks derek that code works great!

     

    The only problem I know have is that the richtextbox flickers a lot, is there a way to stop this?

     

    BTW, thanks all for your help especially derek, the regular expressions definitely speed up the routine.

     

    Thanks, Alex

    Monday, October 22, 2007 3:57 PM
  • Hey Alex,

     

    No worries about the help. I've noticed that the process is still a bit slow when the text size gets rather large, for example running it on the source of all the posts here takes a bit of time (and not 100% exact). Maybe you can use the BackgroundWorker component to help with the speed.

     

    As to the flicker you can get around it by cheating. Use a behind the scenes RichTextBox to do the formatting and then use it to update the original in a oner.

     

    Private Sub ColourCode()

    Dim rtfCopy As New RichTextBox

    rtfCopy.Text = Me.rtfCode.Text 

     

    'http://regexlib.com/

    Dim tags As String = "</?[a-z][a-z0-9]*[^<>]*>"

    Dim attributes As String = "\s\w*=['|""].*['|""]"

    Dim comments As String = "<!--.*-->"

     

    Me.ColourCode(tags, rtfCopy, Color.Blue)

    Me.ColourCode(attributes, rtfCopy, Color.Maroon)

    Me.ColourCode(comments, rtfCopy, Color.Green)

     

    Me.rtfCode.Rtf = rtfCopy.Rtf

    End Sub

     

    Private Sub ColourCode(ByVal pattern As String, ByVal source As RichTextBox, ByVal color As Color)

    Dim matches As MatchCollection = Regex.Matches(source.Text, pattern, RegexOptions.IgnoreCase)

    For Each match As Match In matches

    source.SelectionStart = match.Index

    source.SelectionLength = match.Length

    source.SelectionColor = color

    Next

    End Sub

     

    See if that helps or if it causes any other side effects.

    Monday, October 22, 2007 4:34 PM
  • Hi again,

     

    That works, and I think its slightly faster too however, it is still quite slow... (When taking your suggestion of trying the whole source of this forum thread...)

     

    I was thinking, is there a way to quickly index the words we have allready colored? However this wouldnt I suppose really help as it would still have to look at the word to see if it had been coloured or not.

     

    I dont think theres anyway to make this faster than it is! Unless I try my RTF method that I was trying to implement... although even that could be laggy with the saving of the file and opening constantly...

     

    Heh, seems like a bit of a hopeless cause... I may have to just leave it out...! Real-time syntax highlighting is just too complex I think.

     

    Thanks for all the help though... I just cant really have high system requirements for a text editor hehe! And its allready lagging on my "ok" machine (Athlon 64 3000, 1.25gb RAM)

     

     

    If you have any more suggestions that would be great derek but you've allready helped loads!

     

    (Im going to try and download the source of a few open-source windows text editors and see how they implement this)

     

    EDIT: I looked at the BackgroundWorker class too, it looks like exactly what I need, I just dont understand how I would implement it, once I have put the control on my form do I just give it the name of the sub I want to execute and then wait for the results and update the textbox?

     

    Thanks, Alex.

    Monday, October 22, 2007 6:24 PM
  • Hey man,

     

    The performance problem is being caused setting the formatting through the rich text box.

     

    source.SelectionStart = match.Index

    source.SelectionLength = match.Length

    source.SelectionColor = color

     

    Using a timer on the code using the source of this post, using only the matching without setting any formatting is about 300 - 500 milli-seconds. When setting the formatting using the code above it comes out at about 10 seconds. So that is where the performance problem is.

     

    could be you do the RTF codes directly, build up a replacement string or do some find and replaces instead of using the RTF control directly for the formatting. I have no idea what would be quicker find and replace (perhaps as Regexs are already there) or just building a formatted replacement...

     

    Good luck, if you find a suitable way post again as I'd be interested in seeing what approach improves the performance, that sort of information comes in handy...

     

    Monday, October 22, 2007 6:54 PM
  • Hi again, actually I do have another idea. The Regex has a Replace method that replaces every occurance of the found pattern, so rather than using matches collection this might be used to replace all occurances of a tag with it's colour coded replacement. There is a feature of Regex called Back References which you might need to use, think thats what its called. It allows you to reference inside the Regex the value that was matched. Thinking out loud here but that might be needed to reference what was found so you can reinsert it formatted. Anyway thats another idea you could look into.

     

    Monday, October 22, 2007 6:59 PM
  • Thanks for the help again,

     

    Is there any way to select just the text thats in view for the ColourCode function?

     

    This could speed it up potentially.

     

    Thanks, Alex.

    Monday, October 22, 2007 8:31 PM
  • Hi Alex,

     

    Some good news and some bad news

     

    Good news, found a way to format the html that is VERY fast. Use the Regex.Replace method. Instead of creating matches and then looping over each match just replace what the pattern matches with a colour formatted replica. Here is an example.....

     

    Regex.Replace(code, pattern, rtfColourFormat & " ${0}")

     

    Code is the HTML, pattern is the regex pattern, rtfColourFormat is the RTF code used to colour the text and " ${0}" is the text matched in the pattern, so it will match "<html>" and replace it with "\clr <html>"

     

    Doing the source for this posting takes a second. I'll post the whole code.

     

    Bad news is... this RTF format codes is doing my head in, such a cryptic format, technology was different then and I appreciate that but it's not the easiest thing in the world to get my head around so the example doesn't format the text, it just puts in potential placemarkers that indicates a formatting would happen here, you'll need to cover this yourself, but this is fast, takes a second to mark everything.

     

    Imports System.Text

    Imports System.Text.RegularExpressions

    Imports System.ComponentModel

    Imports System.Diagnostics

    Public Class Form1

     

    Private Sub cmdColourCode_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdColourCode.Click

    Dim watch As New Stopwatch

    watch.Reset()

    watch.Start()

    Me.ColourCode()

    watch.Stop()

    MessageBox.Show(String.Format("Complete: {0} seconds {1} milliseconds", watch.Elapsed.Seconds, watch.Elapsed.Milliseconds))

    End Sub

     

    Private Sub ColourCode()

    Dim code As String = Me.rtfCode.Text

     

    'http://regexlib.com/

    Dim stags As String = "<.*>" '"</?[a-z][a-z0-9]*[^<>]*>"

    Dim etags As String = "</.*>" '"</?[a-z][a-z0-9]*[^<>]*>"

    Dim attributes As String = "\s\w*=['|""].*['|""]"

    Dim comments As String = "<!--.*-->"

     

    code = Me.ColourCode(stags, code, "\cf1")

    code = Me.ColourCode(etags, code, "\cf1")

    code = Me.ColourCode(attributes, code, "\cf2")

    code = Me.ColourCode(comments, code, "\cf3")

     

    Me.rtfCode.Text = code

    End Sub

     

    Private Function ColourCode(ByVal pattern As String, ByVal code As String, ByVal rtfColourFormat As String) As String

    code.Replace(rtfColourFormat, "")

    Return Regex.Replace(code, pattern, rtfColourFormat & " ${0}")

    End Function

     

    End Class

    Tuesday, October 23, 2007 5:39 PM
  • Hi again,

    No problem, I have actually looked into this myself (I was adapting the code last night) the problem comes not with finding and formatting the string but as you say the way an RTF file is formatted...

    Btw, here is a sample that I made to show how formatting could work/should look...


    Code Block

    {\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fmodern\fprq1\fcharset0 Courier New;}{\f1\fswiss\fcharset0 Arial;}}
    {\colortbl ;\red0\green0\blue255;\red255\green0\blue255;\red0\green255\blue0;}
    {\*\generator Msftedit 5.41.21.2507;}\viewkind4\uc1\pard\cf1\f0\fs20 <html>\cf0\par
    \cf1 <head>\cf0\line\cf1 <title> \cf0 unformatted text\cf1  </title>\par
    <font colour=\cf2 "attribute in pink"\cf1 >\par
    \cf3 <!-- comment //-->\cf0\f1\par
    }
     
     


    (that might help but you probably know how it works)

    The basic format is this:

    for tags \cf1 <tag>\cf0 my_string_here\cf1 </end tag>\par <--(cf0 = change back to standard)
    for attributes \cf2 my_attribute_here\cf1\ (cf1 because of blue > tag at end)
    for comments \cf3 my_comment_here\cf0\par

    and \par is a line break

    [Hope that helps a bit...]

    This is all dependant on the colortbl at the header of the file though E.g. for mine:

    {\colortbl ;\red0\green0\blue255;\red255\green0\blue255;\red0\green255\blue0;}


    e.g. for what I/we want to do the file header would have to look something like this:


    Code Block

    {\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fmodern\fprq1\fcharset0 Courier New;}}
    {\colortbl ;\red0\green0\blue255;\red255\green255\blue255;\red255\green0\blue255;\red0\green128\blue0;}
    {\*\generator Msftedit 5.41.21.2507;}\viewkind4\uc1\pard\cf1\f0\fs20 text here

    and then down here?


    }
     


    The only problem I then have is this, the text has to go on the 3rd line first and then down to the fourth for the formatting to work properly...

    I can open the file with rtfCopy.Text = System.IO.ReadAllText(path) to read the source plain text instead of the formatted rtf and then load in the formatted after with rtfCopy.LoadFile() but then its setting the cursor location, plus this will have change for different colours as the source rtf file will have to have different/large length color tables... (colortbl)

    Its one gigantic headache for me... Which is why I have semi-given up at the stage... I would still love to see any solution that you come up with though, as the ones you have produced so-far are far better than my efforts.

    Thanks again, Alex.

    Tuesday, October 23, 2007 6:01 PM
  • What you could do is base the code highlighting on the RTF boxes Text property, this returns the plain text without RTF codes. Prepend it with your RTF formatting, inject all your code using the replace, append the final } character before setting the RTF boxes RTF property. Your almost there I think, the Regex.Replace() method looked fast so really it's just the final stages, I can appreciate the RTF format is annoying but your almost there it's just the detail. You know it's times like this I'm glad its not my project. Stick with it, it's almost there.

    Wednesday, October 24, 2007 5:28 PM
  • Derek,

    Well, I finally got it working!

    All it took was a formatted RTF document done in Wordpad and a few changes to your code and its working perfectly - and fast! it takes just 276 milliseconds to highlight the source for this post now...!

    Screenshot

    Thanks for all your help and support, oh and heres the final code:


    Code Block

    Imports System.Text

    Imports System.Text.RegularExpressions

    Imports System.ComponentModel

    Imports System.Diagnostics

    Public Class Form1

        Private Sub cmdColourCode_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdColourCode.Click

            Dim watch As New Stopwatch

            watch.Reset()

            watch.Start()

            Me.ColourCode()

            watch.Stop()

            MessageBox.Show(String.Format("Complete: {0} seconds {1} milliseconds", watch.Elapsed.Seconds, watch.Elapsed.Milliseconds))

        End Sub

        Private Sub ColourCode()

            Dim code As String = Me.rtfCode.Text



            'http://regexlib.com/

            Dim stags As String = "<.*>" '"</?[a-z][a-z0-9]*[^<>]*>"

            Dim etags As String = "</.*>" '"</?[a-z][a-z0-9]*[^<>]*>"

            Dim attributes As String = "\s\w*=['|""].*['|""]"

            Dim comments As String = "<!--.*-->"



            code = Me.ColourCode(stags, code, "\cf1")

            code = Me.ColourCode(etags, code, "\cf1")

            code = Me.ColourCode(attributes, code, "\cf2")

            code = Me.ColourCode(comments, code, "\cf3")

            code = code.Replace(Chr(13), "\par")
            code = code.Replace(Chr(10), "\par")

            Me.rtfCode.Text = "{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fmodern\fprq1\fcharset0 Consolas;}}{\colortbl ;\red0\green0\blue255;\red128\green0\blue0;\red0\green255\blue0;}{\*\generator Msftedit 5.41.21.2507;}\viewkind4\uc1\pard\f0\fs20" & code & vbCrLf & "}"
            Me.rtfCode.Rtf = Me.rtfCode.Text

        End Sub

        Private Function ColourCode(ByVal pattern As String, ByVal code As String, ByVal rtfColourFormat As String) As String

            code.Replace(rtfColourFormat, "")

            Return Regex.Replace(code, pattern, rtfColourFormat & " ${0}")

        End Function

    End Class





    Thursday, October 25, 2007 1:56 PM
  • And yeah... there a few bugs (scripts) im trying to iron those out...

    Alex.
    Thursday, October 25, 2007 2:09 PM
  • Thanks brilliant Alex. nice one. It works a treat apart from the bugs but those will iron out with time. I'm chuffed it's working at the speed it is, all good knowledge sharing. See you around the forums.
    Thursday, October 25, 2007 8:44 PM
  • Hello,

    I am doing somthing almost exactly like this, looks very well done and cool. I was trying the code you showed 3-4 posts away from this post and had a few errors in my project, I am using C# but I found a translator online so I had errors inside the ColourCode:

    C# Code
    code = code.Replace(Strings.Chr(13),
    \\par);

     

    thats the C# version and here it is in VB:

    VB Code
    code = code.Replace(Chr(13), "\par")

     

    In the C# version it has an error with Strings.

    where or what is Strings.Chr(13) or (Chr(13)

    my error is:

    Error 1 The name 'Strings' does not exist in the current context

    same with Constants

    Also did you fix those few bugs in your code? If so would you mind posting the new code that I might be able to use in my project?

    Thanks,

    Programmer01

     

    EDIT:

    Hello, never-mind about what I said before, those are all fixed now. Right now I think the code is working great except it relines everything, was that the bug you found in it that you needed to fix? Also in here:

    Code Block
    Me.rtfCode.Text = "{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fmodern\fprq1\fcharset0 Consolas;}}{\colortbl ;\red0\green0\blue255;\red128\green0\blue0;\red0\green255\blue0;}{\*\generator Msftedit 5.41.21.2507;}\viewkind4\uc1\pard\f0\fs20" & code & vbCrLf & "}"

     

    Would you mind teling me for all of those colors what ones are for what types of things so I might be able to set the colors to the way I would like it for my project?

     

     

    Thanks,

    Programmer01


    Friday, November 9, 2007 12:19 AM
  • Hello,

     

    Considering that many developers in this forum ask how to implement syntax highlighting in a RichTextBox control, my team has created a code sample for this frequently asked programming task in Microsoft All-In-One Code Framework. You can download the code samples at:

     

    CSRichTextBoxSyntaxHighlighting

    http://bit.ly/CSRichTextBoxSyntaxHighlighting

     

    With these code samples, we hope to reduce developers’ efforts in solving the frequently asked

    programming tasks. If you have any feedback or suggestions for the code samples, please email us: onecode@microsoft.com.

    ------------

    The Microsoft All-In-One Code Framework (http://1code.codeplex.com) is a free, centralized code sample library driven by developers' needs. Our goal is to provide typical code samples for all Microsoft development technologies, and reduce developers' efforts in solving typical programming tasks.

    Our team listens to developers’ pains in MSDN forums, social media and various developer communities. We write code samples based on developers’ frequently asked programming tasks, and allow developers to download them with a short code sample publishing cycle. Additionally, our team offers a free code sample request service. This service is a proactive way for our developer community to obtain code samples for certain programming tasks directly from Microsoft.

    Thanks

    Microsoft All-In-One Code Framework

    Thursday, March 24, 2011 1:36 AM