none
Issue with retrieving items from IEnumerable<T> using Enumerator RRS feed

  • Question

  • I have a class with 11 string properties, together with an IEnumerable (Initialised to a generic List) property, each item of which has an int property, a long property and 3x Nullable int (int?) properties. The typical number of items in this sub property list is 4-6.

    I have an IEnumerable containing ~700K of these, and I'm breaking them out into batches of 1000 using the following code;

    var enumerator = items.GetEnumerator();

    enumerator.MoveNext();

    List<object> entityBatch = new List<object>();

    DateTime start = DateTime.Now.Ticks;

    for (int iterStep = 0; iterStep < batchSize; iterStep++)
    {
        entityBatch.Add(enumerator.Current);
        enumerator.MoveNext();
    }

    DateTime end = DateTime.Now.Ticks;

    However, to process this loop is taking around 12 seconds (as measured using the start / end tick count)....that in itself is bad, but the real question is can anybody shed any light on the fact that if I remove the list property, the same data is processed in 0.25 seconds - so several orders of magnitude more quickly?

    I realise there will be some casting going on from objects to my actual types, but the difference makes me think I'm missing something pretty significant.




    Wednesday, February 24, 2016 2:30 PM

Answers

  • Not sure about the list thing.  But treating a list in batches is just index arithmetic.  You don't actually have to do any work.  Just treat the nth item as part of the (n/1000)th batch.

    But if you absolutely had to, making batches is easy.  Here's one way to partition items into groups easily.

    const int maxItemsPerGroup = 1000;
    var groups = things
        .Select( ( obj, index ) => new { obj = obj, groupIndex = index / maxItemsPerGroup } )
        .GroupBy( x => x.groupIndex, x => x.obj );
    


    FYI: Parallel.For has a nice partitioner, you don't have to make your own batches if the idea is just to process them in parallel.

    Wednesday, February 24, 2016 4:45 PM

All replies

  • Not sure about the list thing.  But treating a list in batches is just index arithmetic.  You don't actually have to do any work.  Just treat the nth item as part of the (n/1000)th batch.

    But if you absolutely had to, making batches is easy.  Here's one way to partition items into groups easily.

    const int maxItemsPerGroup = 1000;
    var groups = things
        .Select( ( obj, index ) => new { obj = obj, groupIndex = index / maxItemsPerGroup } )
        .GroupBy( x => x.groupIndex, x => x.obj );
    


    FYI: Parallel.For has a nice partitioner, you don't have to make your own batches if the idea is just to process them in parallel.

    Wednesday, February 24, 2016 4:45 PM
  • Is your "class" actually a "struct" per chance?  Because that could be the reason.  Can't tell without seeing the actual class definition.
    Wednesday, February 24, 2016 4:47 PM