none
Faster way to insert characters into a String or Array RRS feed

  • Question

  • I need to take a String of around 200,000 characters and insert around 40,000 special characters into it in random places, then turn it into a Byte Array. The first code block below converts the string into an array, the array to a ListOf, inserts the characters, then converts the ListOf back to an array. RNG is a "New Random"

            Dim MyArray() As Byte = Encoding.Default.GetBytes(MyString)
            Dim ArrayList As New List(Of Byte)
            ArrayList.AddRange(MyArray)
            Dim Difference As Integer = ArrayList.Count / 5
            If Difference > 0 Then
                For FillIndex As Integer = 0 To Difference - 1
                    ArrayList.Insert(RNG.Next(0, ArrayList.Count), 0)
                Next
            End If
            MyArray = ArrayList.ToArray

    The second way was to try inserting while it's still a string

        Function InsertInto(Target As String, where As Integer, what As String) As String
            Return Target.Substring(0, where) & what & Target.Substring(where)
        End Function
    
    

    The first one takes around 10 seconds and the second one is a tad slower. I'm hoping for a much faster way


    Sunday, September 24, 2017 10:41 PM

Answers

  • Also check a solution that uses StringBuilder and Append instead of Insert:

    For i As Integer = 0 To 8000
        MyString &= "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    Next
    
    Dim SB As New StringBuilder
    
    TextBox1.AppendText("String Length = " & MyString.Length & vbNewLine)
    
    For test As Integer = 0 To 9
    
        sw.Reset()
        sw.Start()
    
        Dim Difference As Integer = MyString.Length \ 5
        Dim indices(Difference - 1) As Integer
    
        For i As Integer = 0 To Difference - 1
            indices(i) = rng.Next(0, MyString.Length + i)
        Next
    
        Array.Sort(indices)
    
        SB.Clear()
    
        Dim j = -1
        Dim f = 0
        For i As Integer = 0 To Difference - 1
            Dim count = indices(i) - j - i - 1
            If count < 0 Then count = 0 '
            SB.Append(MyString, f, count).Append(vbNullChar)
            f += count
            j = indices(i)
        Next
        SB.Append(MyString, f, MyString.Length - f - 1)
    
        Dim MyArray() As Byte = Encoding.Default.GetBytes(SB.ToString)
        sw.Stop()
    
        TextBox1.AppendText("Elapsed = " & sw.ElapsedMilliseconds & " ms." & vbNewLine)
    
    Next




    • Edited by Viorel_MVP Tuesday, September 26, 2017 4:42 PM
    • Marked as answer by Devon_Nullman Wednesday, September 27, 2017 12:02 PM
    Tuesday, September 26, 2017 4:32 PM

All replies

  • Have you tried using a StringBuilder object rather than a String to do the
    insertions? It is usually faster than modifying a String as it is not
    immutable like a String object.

    StringBuilder Class
    https://msdn.microsoft.com/en-us/library/system.text.stringbuilder(v=vs.110).aspx?cs-save-lang=1&cs-lang=vb#code-snippet-1

    Using the StringBuilder Class in .NET
    https://docs.microsoft.com/en-us/dotnet/standard/base-types/stringbuilder

    StringBuilder.Insert Method
    https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder.insert?view=netframework-4.7

    - Wayne

    Sunday, September 24, 2017 10:59 PM
  • Hi

    Using the code below, I got a very consistent 255ms average over several hundred tests. Tried using a StringBuilder and no improvement seen.

    Wonder why you are getting closer to 10sec?

    Option Strict On
    Option Explicit On
    Imports System.Text
    
    Public Class Form1
        Dim rng As New Random
        Dim MyString As String = Nothing
        Dim sw As New Stopwatch
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles Me.Load
            For i As Integer = 0 To 8000
                MyString &= "ABCDEFGHIJKLMNOPQRSTU"
            Next
            sw.Start()
            For test As Integer = 0 To 99
                Dim MyArray() As Byte = Encoding.Default.GetBytes(MyString)
                Dim ArrayList As New List(Of Byte)
                ArrayList.AddRange(MyArray)
            Dim Difference As Integer = ArrayList.Count \ 5
            If Difference > 0 Then
                    For FillIndex As Integer = 0 To Difference - 1
                        Dim r As Integer = rng.Next(0, ArrayList.Count)
                        ArrayList.Insert(r, 0)
                    Next
                End If
                MyArray = ArrayList.ToArray
            Next
            sw.Stop()
            'average over several hundred tests = 255ms per test
        End Sub
    End Class
    


    Regards Les, Livingston, Scotland

    Sunday, September 24, 2017 11:21 PM
  • The first one takes around 10 seconds and the second one is a tad slower. I'm hoping for a much faster way

    If you are getting similar time for those two examples that suggests that both the substring expression and the array insert expression require copying all existing content into a new instance. The InsertRange is described as O(n+m) which implies that a copy is required.

    The other option is to create a destination array of the required size and use array copy to copy portions of the existing array into that new array.  The array copy method allows you to define a starting position for the destination and source, and the length to copy.  As that operation does not involve creating a new destination array each time, it should be faster. Array.Copy is described as O(n).   https://msdn.microsoft.com/en-us/library/z50k9bft(v=vs.110).aspx

    Sunday, September 24, 2017 11:25 PM
  • Using a Quad-Core Q9550, I get :

    String Length = 208026
    Elapsed = 2883 ms.
    Elapsed = 2872 ms.
    Elapsed = 2881 ms.
    Elapsed = 2886 ms.
    Elapsed = 2882 ms.
    Elapsed = 2894 ms.
    Elapsed = 2876 ms.
    Elapsed = 2883 ms.
    Elapsed = 2878 ms.
    Elapsed = 2893 ms.
    Elapsed = 2889 ms.
    Elapsed = 2894 ms.
    Elapsed = 2878 ms.
    Elapsed = 2866 ms.
    Elapsed = 2872 ms.
    Elapsed = 2905 ms.
    Elapsed = 2893 ms.
    Elapsed = 2869 ms.
    Elapsed = 2894 ms.
    Elapsed = 2886 ms.

    I made the string a little bigger by adding "VWXYZ" to each addition.

    Gonna restart the computer as I have been having some issues - Thanks Windows 10. This may be the fourth time I have to re-do the Start Menu.

    Update:

    Edit - tried using StringBuilder and got virtually the same times.

    Using StringBuilder:
    String Length = 208026
    Elapsed = 2335 ms.
    Elapsed = 2356 ms.
    Elapsed = 2310 ms.
    Elapsed = 2351 ms.
    Elapsed = 2383 ms.
    Elapsed = 2355 ms.
    Elapsed = 2325 ms.
    Elapsed = 2326 ms.
    Elapsed = 2324 ms.
    Elapsed = 2325 ms.

            TextBox1.Clear()
            For i As Integer = 0 To 8000
                MyString &= "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
            Next
            Dim SB As New StringBuilder
            TextBox1.AppendText("String Length = " & MyString.Length.ToString & vbNewLine)
            For test As Integer = 0 To 9
                SB.Clear()
                SB.Append(MyString)
                sw.Reset()
                sw.Start()

                Dim Difference As Integer = SB.Length \ 5
                For FillIndex As Integer = 0 To Difference - 1
                    Dim r As Integer = rng.Next(0, SB.Length)
                    SB.Insert(r, Chr(0))
                Next
                Dim MyArray() As Byte = Encoding.Default.GetBytes(SB.ToString)
                sw.Stop()
                TextBox1.AppendText("Elapsed = " & sw.ElapsedMilliseconds.ToString & " ms." & vbNewLine)
            Next

    • Edited by Devon_Nullman Monday, September 25, 2017 1:55 AM Added More Info
    Monday, September 25, 2017 12:27 AM
  • Hi

    You managed a little faster than my tests, probably down to processor. In any case, with times like that, I doubt that there could be huge improvement possible (though I bet someone could come up with a faster version).


    Regards Les, Livingston, Scotland

    Monday, September 25, 2017 11:48 AM
  • Hi

    Using the code below, I got a very consistent 255ms average over several hundred tests. Tried using a StringBuilder and no improvement seen.

    Wonder why you are getting closer to 10sec?



    Regards Les, Livingston, Scotland

    I should have said 10 sec including all the other processing. In any case my times for the same code is 10 times slower than yours. Not sure why. Acamar's suggestion to use array copy is blazingly fast but I just cannot figure out yet how to insert the zeroes randomly as well as somewhat evenly distributed through the whole destination array.
    Monday, September 25, 2017 11:39 PM
  • randomly as well as somewhat evenly distributed through the whole destination array.

    That would usually be I = n * (count \ div) + (Rnd(0,rng * 2) - rng) where I is the calculated position in the array, n indicates which insertion this is (loop counter), div is the number of insertions in total (loop limit), and rng is the  maximum variation for each insertion point.

    Tuesday, September 26, 2017 6:35 AM
  • Hi Devon

    Wonder if you could try out this code as a stand alone just to give comparitive results. I misread one of your earlier posts and wrongly concluded you were getting marginally faster results than I had.

    The one thing that stand out is that when you increased the length of the starting string with the addition of 'VWXYZ', there was a large impact on the timings (to be expectedof course), so much that I am wondering if we are comparing similar conditions (hence this request)

    For a start string of 168021 characters I get an average of about 255ms for 100 tests. Adding the 'VWXYZ'  and increasing the start string to 208026 characters, I get an average of about 400ms for 100 tests

    Option Strict On
    Option Explicit On
    Imports System.Text
    
    Public Class Form1
        Dim rng As New Random
        Dim MyString As String = Nothing
        Dim sw As New Stopwatch
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles Me.Shown
            For i As Integer = 0 To 8000
                MyString &= "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
            Next
            Dim MyArray() As Byte = Encoding.Default.GetBytes(MyString)
            Dim ArrayList As New List(Of Byte)
            TextBox1.AppendText("String start length = " & MyString.Length.ToString & vbNewLine)
            Dim total As Long = 0
            Dim tests As Integer = 100
            For test As Integer = 1 To tests
                ArrayList.Clear()
                ArrayList.AddRange(MyArray)
                Dim Difference As Integer = ArrayList.Count \ 5
                If Difference > 0 Then
                    sw.Start()
                    For FillIndex As Integer = 0 To Difference - 1
                        Dim r As Integer = rng.Next(0, ArrayList.Count)
                        ArrayList.Insert(r, 0)
                    Next
                    TextBox1.AppendText(test.ToString("0000 ") & "(length = " & ArrayList.Count.ToString & ") Elapsed = " & sw.ElapsedMilliseconds.ToString & " ms." & vbNewLine)
                End If
                total += sw.ElapsedMilliseconds
                sw.Reset()
            Next
            TextBox1.AppendText("Average = " & (total \ tests).ToString & " ms")
    
            ' using the start string constrained to upper limit 'U'
            ' 100 tests averaged 250ms ~ 255ms
    
            ' using the start string with 'VWXYZ' added
            ' 100 tests averaged 395ms ~ 405ms
    
        End Sub
    End Class

    .


    Regards Les, Livingston, Scotland


    • Edited by leshay Tuesday, September 26, 2017 3:22 PM
    Tuesday, September 26, 2017 3:21 PM
  • Tried the code in your original post and it wasn't near 10 seconds.

        Private Shared prng As New Random
        Private stpw As New Stopwatch
        Private Sub InsRandChs(MyString As String)
            stpw.Restart()
            Dim MyArray() As Byte = System.Text.Encoding.Default.GetBytes(MyString)
            Dim ArrayList As New List(Of Byte)
            ArrayList.AddRange(MyArray)
            Dim Difference As Integer = ArrayList.Count \ 5
            If Difference > 0 Then
                For FillIndex As Integer = 0 To Difference - 1
                    Dim loc As Integer = prng.Next(0, ArrayList.Count)
                    ArrayList.Insert(loc, 0)
                Next
            End If
            MyArray = ArrayList.ToArray
            stpw.Stop()
            Debug.Write("'  ")
            Debug.WriteLine("Len = {0} took {1} secs.", MyString.Length, stpw.Elapsed)
            '
            'results
            '
            '  Len = 250040 took 00:00:00.5160177 secs.
            '  Len = 290040 took 00:00:00.6796244 secs.
            '  Len = 330040 took 00:00:00.9040922 secs.
            '  Len = 370040 took 00:00:01.1612514 secs.
            '  Len = 410040 took 00:00:01.4327856 secs.
            '  Len = 450040 took 00:00:01.7101566 secs.
    
        End Sub
    


    "Those who use Application.DoEvents() have no idea what it does and those who know what it does never use it." - MSDN User JohnWein    Multics - An OS ahead of its time.

    Tuesday, September 26, 2017 3:56 PM
  • Also check a solution that uses StringBuilder and Append instead of Insert:

    For i As Integer = 0 To 8000
        MyString &= "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    Next
    
    Dim SB As New StringBuilder
    
    TextBox1.AppendText("String Length = " & MyString.Length & vbNewLine)
    
    For test As Integer = 0 To 9
    
        sw.Reset()
        sw.Start()
    
        Dim Difference As Integer = MyString.Length \ 5
        Dim indices(Difference - 1) As Integer
    
        For i As Integer = 0 To Difference - 1
            indices(i) = rng.Next(0, MyString.Length + i)
        Next
    
        Array.Sort(indices)
    
        SB.Clear()
    
        Dim j = -1
        Dim f = 0
        For i As Integer = 0 To Difference - 1
            Dim count = indices(i) - j - i - 1
            If count < 0 Then count = 0 '
            SB.Append(MyString, f, count).Append(vbNullChar)
            f += count
            j = indices(i)
        Next
        SB.Append(MyString, f, MyString.Length - f - 1)
    
        Dim MyArray() As Byte = Encoding.Default.GetBytes(SB.ToString)
        sw.Stop()
    
        TextBox1.AppendText("Elapsed = " & sw.ElapsedMilliseconds & " ms." & vbNewLine)
    
    Next




    • Edited by Viorel_MVP Tuesday, September 26, 2017 4:42 PM
    • Marked as answer by Devon_Nullman Wednesday, September 27, 2017 12:02 PM
    Tuesday, September 26, 2017 4:32 PM
  • Thanks Viorel_

    I wound up using parts of your code to go along with Array.copy - This sped things up by a factor of more than 10 and distributes the null chars quite evenly and randomly.

    Wednesday, September 27, 2017 12:04 PM