Array.Sort performance issue RRS feed

  • Question


    Using Array.Sort with a Comparison(of T) delegate, I noticed some impossible equality outcomes. In other words, I knew the input array contained no pair of elements that compare equally. So, I made an array of distinct integers a() with a(i)=i for i ranging 0..ubound(a). I randomly shuffled the array and called Array.Sort with the array and my comparer. Upon completion, I verified that the array was back in its original unshuffled form with a(i)=i. Next, in the comparer, I debug.asserted that input arguments were different, and found that Array.Sort often called the comparer passing the same value in arg 1 and arg 2. Since all values in a() were distinct integers, Array.Sort must be comparing the ith to the ith element. Looking further, I found that the same pair of distinct elements were often compared multiple times.

    When sorting 1000 elements, 10% of all compares were comparing an item to itself, and 20% were comparing the same distinct pair multiple times. So approximately 30% of all compares called by Array.Sort are suspicious. For smaller array sizes, the percentages go up, and vice versa.

    To me, this appears inefficient at best and buggy at worst. I don't think buggy because Array.Sort has never failed for me. I can understand some of the 30% if Array.Sort implements QuickSort using so-called guard elements to eliminate bounds checking during an iterative operation. But 30%? I wrote a naive median-1 QuickSort, used the same comparer and saw no such anomalies. An Item was never compared itself, and different items were compared at most once.

    In addition, when sorting n distinct elements initially in random order, the theoretical minimum number of comparisons required is log2(n!). The total number of comparisons actually made divided by log2(n!) is thus an efficiency measure, lower and approaching 1 being better. A median-5 QuickSort scores about 1.1, and a naive median-1 QuickSort scores 1.3 to 1.4. But Array.Sort scores 1.5 to 1.7.

    So, what's up with Array.Sort? Can anyone shed any light?

    My VB test class ArraySortIssue is enclosed. Instantiate z as a New ArraySortIssue object and look at the string returned by z.Test. Comment in the assert statement in the comparer to see an issue as it occurs (in this case, life is easier in the debugger with a very small array).

    Public Class ArraySortIssue ' Array.Sort performance test
      ' prove Array.Sort compares elements to themselves
      ' prove Array.Sort compares the same pair of elements too many times
      Public a() As Integer ' array to be sorted, distinct integers 0..ubound(a) in random order
      Public t(,) As UShort ' tallies of values compared
      Public Function CompareInt(ByVal x As Integer, ByVal y As Integer) As Integer
        ' for overload 'Public Shared Sub Sort(Of T)(array As T(), comparison As Comparison(Of T))'
        t(x, y) = CUShort(t(x, y) + 1) ' passing this means 0<=i<=ubound and 0<=j<=ubound
        'Debug.Assert((x <> y) And (t(x, y) <= 1)) ' comment this in to see each compare oddity
        Return x - y
      End Function
      Public Sub Swap(ByVal i As Integer, ByVal j As Integer)
        ' swap a(i) with a(j)
        Dim k As Integer = a(i) : a(i) = a(j) : a(j) = k
      End Sub
      Public Sub NaiveQuickSort(ByVal f As Integer, ByVal l As Integer)
        ' Naive Median-1 Quicksort a(f..l), yields no anomalous comparisons
        If f < l Then
          Dim i As Integer = f + 1
          Dim j As Integer = l
          While i <= j
            If CompareInt(a(f), a(i)) < 0 Then Swap(i, j) : j -= 1 Else i += 1
          End While
          Swap(f, j)
          NaiveQuickSort(f, j - 1)
          NaiveQuickSort(j + 1, l)
        End If
      End Sub
      Public Function Test() As String
        ' test driver
        Dim MyRandom As New System.Random ' random numbers to generate a random permutation
        Dim n As Integer = MyRandom.Next(2, 9)
        n = 100
        Dim u As Integer = n - 1
        ReDim a(u)
        ReDim t(u, u) ' don't make n too big
        Dim i, j, k As Integer
        For i = 0 To u : a(i) = i : Next ' integers 0..u
        For i = u To 1 Step -1 ' shuffle
          j = MyRandom.Next(0, i + 1) : k = a(i) : a(i) = a(j) : a(j) = k
        If True Then ' do one or the other only
          Array.Sort(Of Integer)(a, AddressOf CompareInt) ' sort using Comparison(of Integer)
          NaiveQuickSort(0, u) ' sort using home grown with same comparer
        End If
        For i = 0 To u : Debug.Assert(a(i) = i) : Next ' verify correctness
        ' reduce t(,) to three scalars
        Dim nGood As Integer = 0 ' i<>j compares counting each at most once
        Dim nEqual As Integer = 0 ' all i=j compares
        Dim nRepeat As Integer = 0 ' excess i<>j compares
        For i = 0 To u
          nEqual += t(i, i) ' all on the diagonal are suspicious
          For j = 0 To i - 1
            k = t(i, j) + t(j, i) ' take i vs j and j vs i together
            If k > 0 Then nGood += 1 : k -= 1 ' one alone or one of many is ok,
            nRepeat += k ' but the remainder are suspicious
        ' output string
        Dim nTotal As Integer = nGood + nEqual + nRepeat
        Dim s As String = "Sort Test" & _
          ", n=" & n & _
          ", nTotal=" & nTotal & _
          ", nGood=" & nGood & " " & CInt(nGood * 100 / nTotal) & "%" & _
          ", nEqual=" & nEqual & " " & CInt(nEqual * 100 / nTotal) & "%" & _
          ", nRepeat=" & nRepeat & " " & CInt(nRepeat * 100 / nTotal) & "%"
    #If False Then
        ' depends on the author's Dec() and Log2Factorial() functions
        s &= ", ceff=" & Dec(nTotal / CInt(Log2Factorial(n)), , 6)
    #End If
        Return s
      End Function
    End Class

    Monday, August 25, 2014 11:30 PM


  • I think that you should switch to .NET Framework 4.5, where nGood is 97–98%.

    • Edited by Viorel_MVP Tuesday, August 26, 2014 6:50 AM
    • Marked as answer by amercer Tuesday, August 26, 2014 7:50 AM
    Tuesday, August 26, 2014 6:50 AM

All replies