none
Fastest way to search object in List(of T) RRS feed

  • Question

  • Is there anything faster than this?

    List (of customclass)

    customclass has public property ID as long

    Find(Function(x) x.ID = id)

    I have some lists as large as 7000 custom class items.  And I know this can get bigger when the users' get their hands on it.  So I need to make sure I am doing the absolute fasted method of finding an item in the list (by ID).


    Jamie V Johnson

    Tuesday, August 15, 2017 8:12 PM

Answers


  • The machines we are talking about are desktop and laptop pcs (no servers at this time).  Modern generation, healthy builds, windows 10 os, some awesomely powered, some not so.  Even with all the power my CAD workstation 6core Xeon with 32gb ddr4, and 8gb graphics has, I see enough drag to know it must be addressed.  Personally I don't know how the game programmers can handle such large quantities of particles moving around all at once, I am impressed.


    Jamie V Johnson

    Parallelization is also something that you might consider.

    PLINQ can be impressively fast for some things; it can also be a big disappointment for other situations.

    I agree that testing, and testing, and more testing is the only way to really prove out any of it.

    ;-)


    "A problem well stated is a problem half solved.” - Charles F. Kettering


    Tuesday, August 15, 2017 11:09 PM
  • I have successfully cut the load time of the entire process in half by utilizing some of the techniques listed here.  Mostly the effect of parallel computing.  I have used Parallel.foreach, Parallel.For, and attempted Parallel.Invoke.  Where I placed invoke turns out to be inappropriate, so I had to back it down to inline, so imagine one of those scenes where the scientist blows himself up, both funny and sad at the same time.

    Now to put some code where my mouth is...

    Parallel.ForEach - used to cycle through a list where order does not matter:

     System.Threading.Tasks.Parallel.ForEach(Of Transition)(Me, Sub(tr) tr.LoadRelationships())


    Parallel.For - used in place of FindByID where we must abort when the object is found:

     Dim transition As Transition = Nothing
            Dim result = System.Threading.Tasks.Parallel.For(0, Me.Count, Sub(i, state)
                                                                              If Me(i).ID = id Then
                                                                                  transition = Me(i)
                                                                                  state.Stop()
                                                                              End If
                                                                              If state.ShouldExitCurrentIteration Then
                                                                                  Return
                                                                              End If
                                                                          End Sub)
            Return transition

    And the aborted Parallel.Invoke - placed here for those whom it may help:

     'Dim actions() As Action = {Sub() Cycles.LoadRelationships(),
            '           Sub() Cycles.LoadRelationships(),
            '           Sub() Transitions.LoadRelationships(),
            '           Sub() TransitRoutes.LoadRelationships(),
            '           Sub() TransitSchedules.LoadRelationships(),
            '           Sub() ScheduleItems.LoadRelationships(),
            '           Sub() ScheduledItemGroups.LoadRelationships(),
            '           Sub() ProductDefs.LoadRelationships(),
            '           Sub() SecurementSetDefs.LoadRelationships(),
            '           Sub() Locations.LoadRelationships(),
            '           Sub() Products.LoadRelationships(),
            '           Sub() TransportPool.LoadRelationships(),
            '           Sub() Trains.LoadRelationships()}
            'System.Threading.Tasks.Parallel.Invoke(actions)


    Jamie V Johnson

    Friday, August 18, 2017 3:28 PM

All replies

  • What time do you think it takes 5Ms or 10Ms?

    Try to avoid pre optimizing, it serves mostly nothing. If your program is to slow you can always look at places to optimize. 

    Be aware that painting 1 pixel from your program on the screen takes probably more time than this. 



    Success
    Cor


    Tuesday, August 15, 2017 8:22 PM
  • 2000 find loops cost 828ms

    6750 find loops costs 9635ms

    entire loop:

            If _ListIDs IsNot Nothing Then
                For Each id As Long In _ListIDs
                    Dim tr As Transition = MasterSchedule.Instance.Transitions.FindByID(id)
                    If tr IsNot Nothing Then
                        If Me.Contains(tr) = False Then Me.Add(tr)
                    End If
                Next
            End If

    master schedule.transititions.findbyid line uses the find function in question.

    As the list gets larger the find gets slower, the iteration of finds get dynamically slower.

    Naturally the contains gets to play a part in this as well, because the stored list gets larger too.


    Jamie V Johnson


    Tuesday, August 15, 2017 8:30 PM
  • So the find in your first question takes less than 0,3ms

    Maybe you think why he even replies. That is to warn others that they can give any answer. 

    Probably one will be like you wish and you mark that as answer. 

    Try to tell direct what the problem is, otherwise you get replies like mine. 


    Success
    Cor

    Tuesday, August 15, 2017 8:45 PM
  • Jamie,

    Have you considered using a Binary Search? Your class will need to implement the IComparable interface so you can sort it, then you can use it.

    What makes it so fast is that because it's sorted (based on ID, in this case), it cuts the collection down incrementally; it's not looking at each one of them - if it's not in this half, go to the other half, cut that in half ... and so on:

    https://en.wikipedia.org/wiki/Binary_search_algorithm

    (The fastest way, bar none, would be a hashtable or something based on a hashtable like a Dictionary)...


    "A problem well stated is a problem half solved.” - Charles F. Kettering


    Tuesday, August 15, 2017 8:49 PM
  • Hi

    A different approach. A ComboBox with autocomplete which finds an ID in negative time :)

    'Form1 with Combobox1
    Option Strict On
    Option Explicit On
    Public Class Form1
        ' just for some random data
        Dim r As New Random
        Dim s1 As String = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        Dim s As String = s1 & s1.ToLower
        Dim collec As New List(Of MyClass1)
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            ' setup the random test data
            For i As Integer = 0 To 9999
                Dim nc As New MyClass1
                Dim st As Integer = r.Next(0, s.Length - 5)
                nc.String1 = s.Substring(st, 6)
                nc.Number1 = i
                st = r.Next(0, s.Length - 6)
                nc.String2 = s.Substring(st, 5)
                collec.Add(nc)
            Next
            With ComboBox1
                .DataSource = collec
                .DisplayMember = "Number1"
                .ValueMember = "String2"
                .DropDownStyle = ComboBoxStyle.DropDown
                .AutoCompleteMode = AutoCompleteMode.SuggestAppend
                .AutoCompleteSource = AutoCompleteSource.ListItems
            End With
        End Sub
    
        Class MyClass1
            Property String1 As String
            Property Number1 As Integer
            Property String2 As String
        End Class
    End Class


    Regards Les, Livingston, Scotland

    Tuesday, August 15, 2017 8:50 PM
  • Create a helper collection:

       Dim d As New Dictionary(Of Long, CustomClass)

    Then add all of the objects using instructions like

       d.Add(myObject.ID, myObject)

    In order to find the object by ID:

       Dim foundObject As CustomClass

       If d.TryGetValue(someID, foundObject)) Then

          ...

       End If

     

    Compare the performance and also try SortedDictionary, SortedList and Hashtable collections.

    • Proposed as answer by Frank L. Smith Tuesday, August 15, 2017 8:54 PM
    • Edited by Viorel_MVP Tuesday, August 15, 2017 8:55 PM
    Tuesday, August 15, 2017 8:52 PM
  • @Viorel

    Yea likewise I showed I thought DBasnett some months ago. 

    However, the problem is, the ID has to be unique. 

    Can be but the OP did not tell that. 


    Success
    Cor


    Tuesday, August 15, 2017 8:54 PM
  • I'm old enough to deal with the occasional smart a$$..

    Jamie V Johnson

    Tuesday, August 15, 2017 9:50 PM
  • Well this is a loop for a binary loader that puts relationships together after loading the data.  I definitely want to set the rule that you can't enter the same object into the collection twice (hence the if contains rules).  If that offers me a more efficient way, then I'm all over it.  I will put these ideas to the test over the next week to see what works best for my dataset.  Then get back with some numbers.

    Jamie V Johnson

    Tuesday, August 15, 2017 9:53 PM
  • Well this is a loop for a binary loader that puts relationships together after loading the data.  I definitely want to set the rule that you can't enter the same object into the collection twice (hence the if contains rules).  If that offers me a more efficient way, then I'm all over it.  I will put these ideas to the test over the next week to see what works best for my dataset.  Then get back with some numbers.

    Jamie V Johnson


    Please make it obvious who you're addressing; we don't all see this forum the same way.

    "A problem well stated is a problem half solved.” - Charles F. Kettering

    Tuesday, August 15, 2017 9:56 PM
  • Is there anything faster than this?

    This question is answerable, and you have some suggestions.

    So I need to make sure I am doing the absolute fasted method of finding an item in the list (by ID).

    This is not possible without a complete analysis of your data, the way that you are creating the collection, the way that you are using it, and the hardware the app is running on. You should indicate which part of your task should be addressed.

    Tuesday, August 15, 2017 10:05 PM
  • Is there anything faster than this?

    This question is answerable, and you have some suggestions.

    So I need to make sure I am doing the absolute fasted method of finding an item in the list (by ID).

    This is not possible without a complete analysis of your data, the way that you are creating the collection, the way that you are using it, and the hardware the app is running on. You should indicate which part of your task should be addressed.

    ...and if the data has to be persisted and if so, how.

    If it's to use binary serialization - the Dictionary(Of T) will be a problem but not an insurmountable one (but if that's to be considered in the time test, it absolutely should be stated).


    "A problem well stated is a problem half solved.” - Charles F. Kettering


    Tuesday, August 15, 2017 10:13 PM
  • This program is a time and space management program, so... not a game, not super neat graphics, altho I did create my own time bar and modify some existing controls, mostly automatiing what comes out of the .Net/WPF box.

    This first displayed find process is POST serialization, after I have properly loaded the file back into class objects.  These class objects have relationships to each other, and it requires the master class to use its sub collection classes (all inherited list(of T)) to return the class instance from its ID (long).  As for the scope, it is way bigger than just serializing and deserializing.  Anytime a new UI (WPF) is loaded, these kinds of searches may take place again, as it loads control objects, filters data and runs calculation processes.  I've done my best to reduce excessive looping (could probably squeeze out more), by storing live references to related objects throughout the model.  Pretty much a similar use of how Microsoft operates the DataSet/DataTable/Relationship/DataRow model structure (only without data tables).  The processes that run this FindByID command would happen during add/remove objects, based on user creating or deleting many things, but let us take it one fundamental step at a time. 

    Let us make sure that for a list(of T), find.(function(x) x.id=id) is the fastest method vs compare to.  Then let us swap/mutate/change list(of T) into a sorted, keyed list of values and check that again.  Let us compare the overhead of morphing the list with small datasets (500 items) vs large datasets(20,000 items), just so we can find faster.  Once the fastest method is determined, then lets determine the most robust collection object (list(of T), sorted list(of K, of T), dictionary, hash table..etc.)  If I had my way there would be 1 type of collection that was mind-numbingly fast, robust, and easy to work with, dare to dream.

    The machines we are talking about are desktop and laptop pcs (no servers at this time).  Modern generation, healthy builds, windows 10 os, some awesomely powered, some not so.  Even with all the power my CAD workstation 6core Xeon with 32gb ddr4, and 8gb graphics has, I see enough drag to know it must be addressed.  Personally I don't know how the game programmers can handle such large quantities of particles moving around all at once, I am impressed.


    Jamie V Johnson

    Tuesday, August 15, 2017 11:02 PM

  • The machines we are talking about are desktop and laptop pcs (no servers at this time).  Modern generation, healthy builds, windows 10 os, some awesomely powered, some not so.  Even with all the power my CAD workstation 6core Xeon with 32gb ddr4, and 8gb graphics has, I see enough drag to know it must be addressed.  Personally I don't know how the game programmers can handle such large quantities of particles moving around all at once, I am impressed.


    Jamie V Johnson

    Parallelization is also something that you might consider.

    PLINQ can be impressively fast for some things; it can also be a big disappointment for other situations.

    I agree that testing, and testing, and more testing is the only way to really prove out any of it.

    ;-)


    "A problem well stated is a problem half solved.” - Charles F. Kettering


    Tuesday, August 15, 2017 11:09 PM
  • OK first test in, focusing on using the Debug.Profiler, I captured a FindByID event that took average 303 ms to perform.  I changed the code to add a lambda sorting on the list (by ID) right after deserializing, and now it runs that same FindByID in 279 seconds.  So sorting in advance does help some, and I can sort once and forget it.

    I see the list(of t).binarysearch, but it requires an object.  I don't have an object, only the ID property of an object, I want the object.  Unless I implement my own direct binary search function I don't know if this method will work for me.

    I did a bit of digging into the datatable's rowcollection find by id tool, it needs the primary key, so I'm betting it works on a very similar method to a sorted dictionary.  Changing my list to a sortedlist(of key, of object) will cause me a lot of rewrite, and I don't know if that is necessary yet.  It might still prove out to be faster, I'll have to see.

    Next I'll look into this Parallelization and Plinq thing.


    Jamie V Johnson

    Thursday, August 17, 2017 7:29 PM
  • Parallel computing helps even more.  Many of the for each loops can be placed into system.threading.task.parallel.foreach(of object)(objects, sub(x) x.loadrelationships) instead of the traditional for each object in objects, and I saved 5 seconds out of 20 for my load time (post processing).  Unfortunately some threads MUST be performed in order or KABOOM!!!.  Thanks for the ideas, I will continue to refine this down to minimalist processing.  And keep this thread updated.

    Jamie V Johnson

    Thursday, August 17, 2017 10:18 PM
  • I have successfully cut the load time of the entire process in half by utilizing some of the techniques listed here.  Mostly the effect of parallel computing.  I have used Parallel.foreach, Parallel.For, and attempted Parallel.Invoke.  Where I placed invoke turns out to be inappropriate, so I had to back it down to inline, so imagine one of those scenes where the scientist blows himself up, both funny and sad at the same time.

    Now to put some code where my mouth is...

    Parallel.ForEach - used to cycle through a list where order does not matter:

     System.Threading.Tasks.Parallel.ForEach(Of Transition)(Me, Sub(tr) tr.LoadRelationships())


    Parallel.For - used in place of FindByID where we must abort when the object is found:

     Dim transition As Transition = Nothing
            Dim result = System.Threading.Tasks.Parallel.For(0, Me.Count, Sub(i, state)
                                                                              If Me(i).ID = id Then
                                                                                  transition = Me(i)
                                                                                  state.Stop()
                                                                              End If
                                                                              If state.ShouldExitCurrentIteration Then
                                                                                  Return
                                                                              End If
                                                                          End Sub)
            Return transition

    And the aborted Parallel.Invoke - placed here for those whom it may help:

     'Dim actions() As Action = {Sub() Cycles.LoadRelationships(),
            '           Sub() Cycles.LoadRelationships(),
            '           Sub() Transitions.LoadRelationships(),
            '           Sub() TransitRoutes.LoadRelationships(),
            '           Sub() TransitSchedules.LoadRelationships(),
            '           Sub() ScheduleItems.LoadRelationships(),
            '           Sub() ScheduledItemGroups.LoadRelationships(),
            '           Sub() ProductDefs.LoadRelationships(),
            '           Sub() SecurementSetDefs.LoadRelationships(),
            '           Sub() Locations.LoadRelationships(),
            '           Sub() Products.LoadRelationships(),
            '           Sub() TransportPool.LoadRelationships(),
            '           Sub() Trains.LoadRelationships()}
            'System.Threading.Tasks.Parallel.Invoke(actions)


    Jamie V Johnson

    Friday, August 18, 2017 3:28 PM