Answered by:
Array.Sort performance issue
Question

Using Array.Sort with a Comparison(of T) delegate, I noticed some impossible equality outcomes. In other words, I knew the input array contained no pair of elements that compare equally. So, I made an array of distinct integers a() with a(i)=i for i ranging 0..ubound(a). I randomly shuffled the array and called Array.Sort with the array and my comparer. Upon completion, I verified that the array was back in its original unshuffled form with a(i)=i. Next, in the comparer, I debug.asserted that input arguments were different, and found that Array.Sort often called the comparer passing the same value in arg 1 and arg 2. Since all values in a() were distinct integers, Array.Sort must be comparing the ith to the ith element. Looking further, I found that the same pair of distinct elements were often compared multiple times.
When sorting 1000 elements, 10% of all compares were comparing an item to itself, and 20% were comparing the same distinct pair multiple times. So approximately 30% of all compares called by Array.Sort are suspicious. For smaller array sizes, the percentages go up, and vice versa.
To me, this appears inefficient at best and buggy at worst. I don't think buggy because Array.Sort has never failed for me. I can understand some of the 30% if Array.Sort implements QuickSort using socalled guard elements to eliminate bounds checking during an iterative operation. But 30%? I wrote a naive median1 QuickSort, used the same comparer and saw no such anomalies. An Item was never compared itself, and different items were compared at most once.
In addition, when sorting n distinct elements initially in random order, the theoretical minimum number of comparisons required is log2(n!). The total number of comparisons actually made divided by log2(n!) is thus an efficiency measure, lower and approaching 1 being better. A median5 QuickSort scores about 1.1, and a naive median1 QuickSort scores 1.3 to 1.4. But Array.Sort scores 1.5 to 1.7.
So, what's up with Array.Sort? Can anyone shed any light?
My VB test class ArraySortIssue is enclosed. Instantiate z as a New ArraySortIssue object and look at the string returned by z.Test. Comment in the assert statement in the comparer to see an issue as it occurs (in this case, life is easier in the debugger with a very small array).
Public Class ArraySortIssue ' Array.Sort performance test ' prove Array.Sort compares elements to themselves ' prove Array.Sort compares the same pair of elements too many times Public a() As Integer ' array to be sorted, distinct integers 0..ubound(a) in random order Public t(,) As UShort ' tallies of values compared Public Function CompareInt(ByVal x As Integer, ByVal y As Integer) As Integer ' for overload 'Public Shared Sub Sort(Of T)(array As T(), comparison As Comparison(Of T))' t(x, y) = CUShort(t(x, y) + 1) ' passing this means 0<=i<=ubound and 0<=j<=ubound 'Debug.Assert((x <> y) And (t(x, y) <= 1)) ' comment this in to see each compare oddity Return x  y End Function Public Sub Swap(ByVal i As Integer, ByVal j As Integer) ' swap a(i) with a(j) Dim k As Integer = a(i) : a(i) = a(j) : a(j) = k End Sub Public Sub NaiveQuickSort(ByVal f As Integer, ByVal l As Integer) ' Naive Median1 Quicksort a(f..l), yields no anomalous comparisons If f < l Then Dim i As Integer = f + 1 Dim j As Integer = l While i <= j If CompareInt(a(f), a(i)) < 0 Then Swap(i, j) : j = 1 Else i += 1 End While Swap(f, j) NaiveQuickSort(f, j  1) NaiveQuickSort(j + 1, l) End If End Sub Public Function Test() As String ' test driver Dim MyRandom As New System.Random ' random numbers to generate a random permutation Dim n As Integer = MyRandom.Next(2, 9) n = 100 Dim u As Integer = n  1 ReDim a(u) ReDim t(u, u) ' don't make n too big Dim i, j, k As Integer For i = 0 To u : a(i) = i : Next ' integers 0..u For i = u To 1 Step 1 ' shuffle j = MyRandom.Next(0, i + 1) : k = a(i) : a(i) = a(j) : a(j) = k Next If True Then ' do one or the other only Array.Sort(Of Integer)(a, AddressOf CompareInt) ' sort using Comparison(of Integer) Else NaiveQuickSort(0, u) ' sort using home grown with same comparer End If For i = 0 To u : Debug.Assert(a(i) = i) : Next ' verify correctness ' reduce t(,) to three scalars Dim nGood As Integer = 0 ' i<>j compares counting each at most once Dim nEqual As Integer = 0 ' all i=j compares Dim nRepeat As Integer = 0 ' excess i<>j compares For i = 0 To u nEqual += t(i, i) ' all on the diagonal are suspicious For j = 0 To i  1 k = t(i, j) + t(j, i) ' take i vs j and j vs i together If k > 0 Then nGood += 1 : k = 1 ' one alone or one of many is ok, nRepeat += k ' but the remainder are suspicious Next Next ' output string Dim nTotal As Integer = nGood + nEqual + nRepeat Dim s As String = "Sort Test" & _ ", n=" & n & _ ", nTotal=" & nTotal & _ ", nGood=" & nGood & " " & CInt(nGood * 100 / nTotal) & "%" & _ ", nEqual=" & nEqual & " " & CInt(nEqual * 100 / nTotal) & "%" & _ ", nRepeat=" & nRepeat & " " & CInt(nRepeat * 100 / nTotal) & "%" #If False Then ' depends on the author's Dec() and Log2Factorial() functions s &= ", ceff=" & Dec(nTotal / CInt(Log2Factorial(n)), , 6) #End If Return s End Function End Class
Monday, August 25, 2014 11:30 PM
Answers

I think that you should switch to .NET Framework 4.5, where nGood is 97–98%.
 Edited by Viorel_MVP Tuesday, August 26, 2014 6:50 AM
 Marked as answer by amercer Tuesday, August 26, 2014 7:50 AM
Tuesday, August 26, 2014 6:50 AM
All replies

I think that you should switch to .NET Framework 4.5, where nGood is 97–98%.
 Edited by Viorel_MVP Tuesday, August 26, 2014 6:50 AM
 Marked as answer by amercer Tuesday, August 26, 2014 7:50 AM
Tuesday, August 26, 2014 6:50 AM 
Right you are, I am at 4.0. ThanksTuesday, August 26, 2014 7:51 AM