locked
Fastest way to iterate through an IEnumerable<T> RRS feed

  • Question

  • Hi there, 

    I am looking for a fast way to iterate through a IEnumerable object list. Is there any way to achieve a better performance in speed for that kind of iterating? 

    Thanks

    Wednesday, August 23, 2017 11:08 AM

Answers

  • The problem is with those 2 calls to SQLDB. The "get" methods are probably retrieving all the data from the DB. You then enumerate all the objects and try to find the corresponding mappings. All this is being done in memory so it is about as fast as you're going to get but it potentially could take a while to load up the data before you get to that first foreach. 

    Given your code it looks like you need all the data. If so then there isn't much you can do to improve that aspect but most of the time you don't need all the data at once. For example if this stuff is so you can render something in the UI then don't render stuff that isn't visible.

    The other optimization that may help is combining your queries into 1. If you always need to get the objects and then the mapping data then have your query join the data at the database and then return you the combined data. It'll eliminate the second query and the need for your where clause altogether (at least in this code).

    • Marked as answer by sajithrw Thursday, August 24, 2017 5:56 AM
    Thursday, August 24, 2017 4:51 AM

All replies

  • Wednesday, August 23, 2017 11:16 AM
  • I am looking for an IEnumerable object list iterator which means I cannot use for or foreach directly on that data set.
    • Edited by sajithrw Wednesday, August 23, 2017 11:19 AM
    Wednesday, August 23, 2017 11:19 AM
  • I am looking for an IEnumerable object list iterator which means I cannot use for or foreach directly on that data set.

    Dataset? What are you talking about? You are going to have to show proof that a foreach or for loop can't be used on IEnumerable.
    Wednesday, August 23, 2017 3:18 PM
  • Here is an idea but I have not measured the effect in practice. If you know that the IEnumerable<T> is usually T[] or List<T>, then you can add special cases for those:

    IEnumerable<T> enumerable = …;
    
    var array = enumerable as T[];
    if (array != null)
    {
        foreach (T item in array)
        {
           …
        }
    }
    else
    {
        foreach (T item in enumerable)
        {
        }
    }

    That way, the compiler would generate different code for the known types and avoid the virtual method calls of IEnumerator<T>.get_Current and IEnumerator<T>.MoveNext. However, the type check at the start would cost some time.

    Wednesday, August 23, 2017 3:38 PM
  • "I am looking for a fast way to iterate through a IEnumerable object list"

    The speed of iteration has little to do with your code and everything to do with the backend code that actually does the enumeration. There really is no difference in performance between calling ToList vs foreach (for example) beyond the list allocation. The bulk of the time is going to be in the implementation of GetNext which is specific to the implementation that you're referring to (List<T>, iterator method, etc).

    Why are you concerned about this? Do you have perf numbers that seem to indicate that a foreach is slow? If you're concerned about the speed of enumeration then you should probably consider enumerating once and caching the results thereafter in an array to avoid the additional iterations. But that is rarely going to cause any serious performance increase unless the enumeration implementation is inefficient.

    In general I'd recommend that you simply use a foreach loop. Ultimately everything is going to boil down to that code. You could also implement it via an explicit while loop but the performance isn't really going to change and it is going to be harder to maintain anyway.

    "I am looking for an IEnumerable object list iterator which means I cannot use for or foreach directly on that data set."

    You can use IEnumerable<T> on any "legacy" IEnumerable implementation by using the OfType<T> or Cast<T> methods that are exposed on IEnumerable. In code that does this a lot I'd recommend using extension methods. That's how I solve the issue. Provides a very clean, consistent interface with the rest of the code.

    Michael Taylor
    http://www.michaeltaylorp3.net

    Wednesday, August 23, 2017 3:51 PM
  • Sorry, I have mistakenly typed foreach. Yes, I cannot use for loop directly on an IEnumerable<T> type list. Because it's a non-populated. For the foreach it could take about 1.2ms for an loop. 


    • Edited by sajithrw Wednesday, August 23, 2017 9:48 PM
    Wednesday, August 23, 2017 9:43 PM
  • I don't understand. Foreach is the defacto and correct way to enumerate an enumerable object. I cannot think of a single case where you wouldn't do it this way. What exactly is the issue with using foreach? It is going to be the fastest way to enumerate anything.
    Wednesday, August 23, 2017 9:47 PM
  • Yes, I am concerned because if I have to iterate many times (images 100000 times) over a IEnumerable<T> list it decreases performance. But I could see that a For or Foreach for a generic List doesn't take much time as IEnumerable<T> list. But in my case I have to go with IEnumerable<T> list. I will try OfType<T> and casting and let you know the result.
    Wednesday, August 23, 2017 9:54 PM
  • Consider this example,

    public class MyClass{
        public int ID {get; set;};
    
        public string Name;
    
        public int age;
    }
    
    public class IteratorTest{
        public void iterateOverObjList(){
            List<MyClass> listObj = new List<MyClass>();
    
            listObj.Add(new MyClass());
            listObj.Add(new MyClass());
            listObj.Add(new MyClass());
    
            IEnumerabble<MyClass> enumerableList = listObj.AsEnumerable();
            
            // foreach No.1 for IEnumerable data set
            foreach(MyClass item in enumerableList)
            {
               //do something
            }
            
            // foreach No.2 for List
            foreach(MyClass listItem in litsObj)
            {
              // do something
            }
        }
    }
    I am saying that there is a significant performance difference between No.1 and No.2 Foreachs. No.1 takes around 1.2ms for one loop while No.1 take nano seconds. Just wanna know if I could reduce No.1 time. 


    Wednesday, August 23, 2017 10:09 PM
  • I think you're comparing apples to oranges here. This has nothing to do with the usage of foreach. That approach will take the same amount of time irrelevant of what you're storing things in. It simply doesn't matter. In fact a for loop will end up being a little slower because you're using indexing (which most collections don't support).

    I suspect that maybe you're either using LINQ or you're working with a pre-populated list. It is important to understand that how the data is populated may impact your perception of how fast (or slow) enumeration is but that isn't correct. Let's take a simple example. If you look at Directory it has 2 methods on it - GetFiles and EnumerateFiles. The former returns an array (could have been a List<T> too) and the latter returns IEnumerable<T>. In order to return a list/array the collection must first be populated. These collections do not support enumeration while the collection is changing. So when you call GetFiles that call blocks until all the files are retrieved and placed into the collection. At that point you can then enumerate the collection. The cost of the actual enumeration is simply array indexing. But the total cost of enumeration the first time would need to include the time it took to load all the files.

    With EnumerateFiles it works differently. Since it doesn't care about how many files to return it returns a simple type that contains nothing. When you start enumerating then it starts fetching the available files one by one (or perhaps in batch). As a result the method call itself is very fast but you will notice the enumeration is a little slower. That is because it is periodically having to fetch the next set of data. This is very noticeable on large directories. But at the end of the day both approaches will take about the same amount of time. If you always need all the files then GetFiles is technically faster. But if you may or may not need all the files then EnumerateFiles would allow you to potentially eliminate files that you don't care about.

    The process of waiting until you enumerate something before it actually does the enumeration is known as deferred execution. LINQ does this implicitly but many APIs that expose IEnumerable<T> do as well. It is an optimization to handle the case where you ask to enumerate but never do, or start to enumerate but never finish.

    So, back to your situation. In order to load that List<T> with 100K images you have to allocate the memory and load them 1 by 1. If you never need all of them then you just wasted a lot of time and memory. However the enumeration itself will be pretty fast. However if you're using LINQ or something that returns IEnumerable<T> then you may notice the first time through the loop it is really slow. That is because of deferred execution. It has nothing to do with the foreach loop at all. You can confirm if this is behavior you are seeing by taking that IEnumerable<T> object and calling ToList or ToArray on it. Both of these methods would trigger enumeration. Alternatively use FirstOrDefault which triggers the initial enumeration but nothing else. Alternatively you can provide the code here you're using to get that list (along with any helper information) and we can try to identify it for you.

    Wednesday, August 23, 2017 10:13 PM
  • You have to get really, really large elements to notice any difference between those 2 loops. I upped it to 1 mil items before there was a difference of 1ms between the 2 blocks.
    Wednesday, August 23, 2017 10:18 PM
  • Yes, I am not assessing Foreach performance here. Just finding a better way with more speed to iterate through IEnumerable<T>. You are correct on the process of looping through IEnumerable<T> and usual loop. I use LINQ to filter out a small amount of data which I get IEnumerable<T> as a result. And then I iterate through that IEnumerable<T>. But the case is I do this in a outer loop, it iterate 100k times. So this is affecting total performance of the code. If there is nothing to do with iterating speed, I have to go with another solution other than filtering data with LINQ. BTW great explanation. 
    Wednesday, August 23, 2017 10:28 PM
  • It really depends on what LINQ provider you're using as to the impact of the filter. As an example, if you are using LINQ to EF and you apply a where clause before you trigger the enumeration (ToList, FirstOrDefault, etc) then the filter happens at the DB level. It isn't going to get much faster than that. However if the where happens after then it has to pull all the data before applying the where so you end up retrieving way more data than you need. For other providers the implementation can vary. If the where is being applied to in memory objects then you'd take the hit of eval'ing the clause each time you iterate. In this case it is better to filter and store the result as an array or list and then you can continually enumerate without the where clause hit. It really depends upon your code though.

    Thursday, August 24, 2017 2:01 AM
  • public class MyClass
    {
        public int ID {get; set;};
    
        public string Name;
    
        public int age;
    }
    
    public class MyTest
    {
       // here I am fetching 100k objects directly from database using my own sql connection class
       List<MyClass> listObj = SQLDB.getListObj();
    
       // here I am fetching a mapping for MyClass ID, this rows contain ID and a few properties that I need to use in my code
       List<MyClassMap> idMappingList = SQLDB.getIdMapping();
    
       // then I find each properties for each MyClass as bellow
       foreach(MyClass obj in listObj)
       {
           var propertySets = idMappingList.Where(p=> p.ID == obj.ID);
    
           // then I need to iterate through filtered result
           foreach(MyClassMap ListpropertySet in propertySets )
           {
             // do something
           }
           // I need to speed up this loop mechanism. 
       }
    }
    You can take a clear idea about my situation with this code piece. I can provide more details if you want some. I know that IEnumerable<T> is not much faster option if someone need to iterate over a collection. But using LINQ for a faster data filtering is ended up with the IEnumerbale<T> list. So I need to keep same performance on the data filtering and speed up the iteration. I am open to any other solutions to do this if there is any. :-)
    Thursday, August 24, 2017 4:02 AM
  • The problem is with those 2 calls to SQLDB. The "get" methods are probably retrieving all the data from the DB. You then enumerate all the objects and try to find the corresponding mappings. All this is being done in memory so it is about as fast as you're going to get but it potentially could take a while to load up the data before you get to that first foreach. 

    Given your code it looks like you need all the data. If so then there isn't much you can do to improve that aspect but most of the time you don't need all the data at once. For example if this stuff is so you can render something in the UI then don't render stuff that isn't visible.

    The other optimization that may help is combining your queries into 1. If you always need to get the objects and then the mapping data then have your query join the data at the database and then return you the combined data. It'll eliminate the second query and the need for your where clause altogether (at least in this code).

    • Marked as answer by sajithrw Thursday, August 24, 2017 5:56 AM
    Thursday, August 24, 2017 4:51 AM
  • Yes, I fetch all data. But without iterating trough LINQ result (Just imagine I commented out second Foreach) it just takes only 2 second foe the whole execution. But with second Foreach it takes 2 minutes in total. I guess you are right about the iterator, may be it has some external resources still connected some where. I need to make sure it doesn't if I am going to another solution. Is there any way to check that out?  
    Thursday, August 24, 2017 4:59 AM
  • I think you should try to eliminate the LINQ. That LINQ will do its own looping, so in effect you have three loops when you really should need just two.

    public class MyTest
    {
       // here I am fetching 100k objects directly from database using my own sql connection class
       List<MyClass> listObj = SQLDB.getListObj();
    
       // here I am fetching a mapping for MyClass ID, this rows contain ID and a few properties that I need to use in my code
       List<MyClassMap> idMappingList = SQLDB.getIdMapping();
    
       // then I find each properties for each MyClass as bellow
       foreach(MyClass obj in listObj)
       {
     
           foreach(MyClassMap set in idMappingList)
           {
               if(set.ID == obj.ID)
               {
                   // Do something.
               }
           }
       }
    }

    Thursday, August 24, 2017 5:14 AM
  • I have tried that already. But the issue is here if I don't use something like LINQ to filter out object list it will always loop through every element in mapping list, which will lead to slow down my whole process more than this. It took 3 minutes with such solution. 
    Thursday, August 24, 2017 5:20 AM
  • The LINQ itself will search through every element in the mapping list. So the LINQ searches through the mapping list, then the next foreach searches through what the LINQ passed back. This is instead of just having one search through the mapping list.

    If you say you have tried it and using the LINQ is faster, I suppose I have to take your word for it. I just find it hard to believe.


    Thursday, August 24, 2017 5:26 AM
  • No, you are correct. May be because of initial database connection it took 3 minutes. I run it multiple times and the average became 2 minutes, which means there is no different in LINQ Where vs Foreach loop. But it doesn't matter because it still takes the same amount of time to complete. 2 minutes.
    Thursday, August 24, 2017 5:40 AM
  • That makes sense. I am sure removing the LINQ is faster, but if what it returns is only a small list - just a few entries with the right ID - the improvement will probably be in milliseconds, so too small to be significant. If the LINQ returned a few thousand entries, you would probably notice the difference.

    Oh, well. Sorry I could not be of more help.

    Thursday, August 24, 2017 5:56 AM
  • Yes indeed. Your explanation was very helpful. Thanks for the time you invested on this matter. Really appreciate.
    • Edited by sajithrw Thursday, August 24, 2017 6:26 AM
    Thursday, August 24, 2017 6:26 AM