none
SortedDictionary v.s. Dictionary

    Question

  • Hello everyone,



    Here is my understanding of pros and cons SortedDictionary v.s. Dictionary, am I correct? Or anything important missing? What is your practices of when to use SortedDictionary and when to use Dictionary?

    1. Dictionary is faster for insertion and removal, but slower for query value by key compared with SortedDictionary;
    2. Dictionary is using a linear array to store key/value pair, but SortedDictionary is using binary search tree to store key/value pair, so for query, SortedDictionary is faster.



    thanks in advance,
    George

    Monday, June 09, 2008 8:45 AM

Answers

  • Hi!
    I made a small research on the topic, and here is what I've found:

    1) Generally, Dictionary works faster for adding items (tested on 50000 items):
        
    DateTime StartTime = new DateTime();  
    DateTime EndTime = new DateTime();  
     
    StartTime = DateTime.Now;  
     
    for (int i = 1; i < 50000; i++)  
    {  
    dict.Add(i, i);  
    }  
     
    EndTime = DateTime.Now;  
     
    long Time;  
     
    Time = EndTime.Ticks - StartTime.Ticks;  
    this.Text = Time.ToString; 

    The result was: 400576 ticks.

    2) Compared to SortedDictionary with the same code (changed Dictionary to SortedDictionary):
    The result was: 1201728 ticks.

    3) Speaking about getting items from Dictionary:

    DateTime GetStart = new DateTime();  
    DateTime GetEnd = new DateTime();  
    GetStart = DateTime.Now;  
     
    foreach (int a in dict.Keys)  
    {  
    listBox1.Items.Add(dict[a]);  
    }  
     
    GetStop = DateTime.Now;  
     
    long Result;  
    Result = GetStop.Ticks - GetStart.Ticks;  
    this.Text = Result.ToString(); 

    The result was: 717531760 ticks.

    4) Compared to SortedDictionary with the same code (changed Dictionary to SortedDictionary):
    The result was: 704412896 ticks.

    I used this link for general reference:
    http://msdn.microsoft.com/en-us/library/f7fta44c(VS.80).aspx

    Of course the results depend on the computer configuration (CPU, current memory load, etc.) but I think that the results really compare SortedDictionary and Dictionary. The usage already depends on the purpose of the application (if needed high query speed - use SortedDictionary, if high adding speed - use Dictionary).

    Also, consider that this is my opinion.
    • Marked as answer by Hanzheng Zou Thursday, June 12, 2008 7:45 AM
    • Edited by ProfileUser2 Sunday, June 15, 2008 7:17 PM Edit
    Monday, June 09, 2008 10:12 AM
  • Yes, Dictionary<,> will be quicker when fetching things by key. And in general we *don't* need the data sorted by key.

    There are, however, some scenarios when having the data sorted is also useful. For example, if the key is a DateTime, we might want to fetch all the values in a given range (say: Jun 2008 to Jul 2008). We can do this (among other ways) by finding the index of the first key, and then reading data until we find the next item is outside the range. We've done one "lookup" using the dictionary, the necessary number plus one of "fetch" operations, and nothing else. If the data is unsorted we'd need to check every single item separately, which is painful if the list is long.

    Sorted data also lends itself to things like efficient joining (where you can iterate a left and right pair in parallel), and hunting for data using binary search (halving the range each time) - even if the value we want isn't in the list (picking the next or last node instead). Dictionary<,> cannot give any information about values not in the list, where-as with a sorted list you can say "it isn't there, but if it was, it would be around index 34".

    Of course, it is also useful if you know that you want to display the data in order at a later point in time, and don't want to have to sort it more than once...


    Marc
    • Marked as answer by Hanzheng Zou Thursday, June 12, 2008 7:45 AM
    Monday, June 09, 2008 11:56 AM
  • 1: correct
    2: other way around; I'm saying that it is preferable to implement IComparable / IComparable<T> if there is a sensible order for ordering items, only using IComparer / IComparer<T> as a fallback to provide specific ordering.

    For why - let me illustrate with generics and a few layers of abstraction. Down at SomeOtherGenericMethod, it has no way of knowing about any specific comparer to use. We could arguably pass one in as an argument, but then we need to pass that argument through every layer in the call stack, which could be several layers and not very convenient. But by using the default comparer (i.e. IComparer / IComparer<T>) it just works:

    using System;  
    using System.Linq;  
    using System.Collections.Generic;  
    class Foo : IComparable<Foo>, IComparable  
    {  
        public int Key { getset; }  
        public int CompareTo(Foo foo)  
        {  
            return foo == null ? -1 : Key.CompareTo(foo.Key);  
        }  
        int IComparable.CompareTo(object obj)  
        {  
            return CompareTo(obj as Foo);  
        }  
        public override string ToString()  
        {  
            return "Foo " + Key;  
        }  
    }  
    static class Program  
    {  
        static void Main()  
        {  
            List<Foo> foos = new List<Foo>   
            {  
                new Foo { Key = 3}, new Foo {Key = 9},  
                new Foo { Key = 1}, new Foo {Key = 4}  
            };  
            SomeGenericMethod(foos);  
        }  
        static void SomeGenericMethod<T>(IEnumerable<T> data)  
        {  
            // .. do some work  
            // .. do some more work  
            SomeOtherGenericMethod<T>(data);  
        }  
        static void SomeOtherGenericMethod<T>(IEnumerable<T> data)  
        {  
            Console.WriteLine("Min: {0}", data.Min());  
            Console.WriteLine("Max: {0}", data.Max());          
            List<T> items = data.ToList();  
            items.Sort();  
            foreach (T item in items)  
            {  
                Console.WriteLine(item);  
            }          
        }  
     

    Marc
    • Marked as answer by George2 Monday, June 16, 2008 3:02 AM
    Thursday, June 12, 2008 1:38 PM
  • 1: foreach is the route; either on the dictionary itself (for KeyValuePair<,>), or on Keys or Values for just the keys/values.

    2: Both SortedList<,> and SortedDictionary<,> accept an IComparer<TKey>, which you can construct for a custom sort. Otherwise it uses Comparer<TKey>.Default, which will look for at TKey for IComparable<TKey>, or just IComparable as a fallback. If TKey supports neither of these and no custom comparer is provided, it will throw. Similarly, Dictionary<,> accepts an IEqualityComparer<TKey>, otherwise using EqualityComparer<TKey>.Default, which uses IEquatable<TKey> (if found), defaulting to the inbuilt object.Equals / object.GetHashCode() otherwise.

    So if you don't provide anything yourself, the ordinal sort for the keys will be used.

    3:

                SortedList<stringstring> records = new SortedList<string,string>();
                // could be added in any order... this is just easier
                records.Add("c""cello");  
                records.Add("f""flute");  
                records.Add("g""guitar");  
                records.Add("h""harmonica");  
                records.Add("v""violin");  
     
                // find everything between f and k  
                int index = records.IndexOfKey("f");  
                string key;  
                while (records.Comparer.Compare((key = records.Keys[index++]), "k") <= 0)  
                {   
                    Console.WriteLine("{0}={1}", key, records[key]);  
                } 

    4: binary search takes a while to write - but to illustrate:

    pick a number between 1 & 100; pretend you chose 72, but I don't know that...

    so; between 1 and 100; take a midpoint (50); is it 50, <50 or >50? ahh... >50
    so; between 50 and 100; take a midpoint (75); is it 75, <75 or >75? ahh... <75
    so; betweeb 50 and 75; take a midpoint (63); is it 63, <63 or >63? ahh >63
    so; between 63 and 74; take a midpoint (69).... ahh > 69
    so; ... 69 and 74 - try midpoint 72 - found your number!

    So I did that in 5 iterations, instead of trying "is it 1? is it 2? is it 3?...", but it only works if the data is ordered and uniformly distributed.
    Marc
    Tuesday, June 10, 2008 6:58 AM

All replies

  • Hi!
    I made a small research on the topic, and here is what I've found:

    1) Generally, Dictionary works faster for adding items (tested on 50000 items):
        
    DateTime StartTime = new DateTime();  
    DateTime EndTime = new DateTime();  
     
    StartTime = DateTime.Now;  
     
    for (int i = 1; i < 50000; i++)  
    {  
    dict.Add(i, i);  
    }  
     
    EndTime = DateTime.Now;  
     
    long Time;  
     
    Time = EndTime.Ticks - StartTime.Ticks;  
    this.Text = Time.ToString; 

    The result was: 400576 ticks.

    2) Compared to SortedDictionary with the same code (changed Dictionary to SortedDictionary):
    The result was: 1201728 ticks.

    3) Speaking about getting items from Dictionary:

    DateTime GetStart = new DateTime();  
    DateTime GetEnd = new DateTime();  
    GetStart = DateTime.Now;  
     
    foreach (int a in dict.Keys)  
    {  
    listBox1.Items.Add(dict[a]);  
    }  
     
    GetStop = DateTime.Now;  
     
    long Result;  
    Result = GetStop.Ticks - GetStart.Ticks;  
    this.Text = Result.ToString(); 

    The result was: 717531760 ticks.

    4) Compared to SortedDictionary with the same code (changed Dictionary to SortedDictionary):
    The result was: 704412896 ticks.

    I used this link for general reference:
    http://msdn.microsoft.com/en-us/library/f7fta44c(VS.80).aspx

    Of course the results depend on the computer configuration (CPU, current memory load, etc.) but I think that the results really compare SortedDictionary and Dictionary. The usage already depends on the purpose of the application (if needed high query speed - use SortedDictionary, if high adding speed - use Dictionary).

    Also, consider that this is my opinion.
    • Marked as answer by Hanzheng Zou Thursday, June 12, 2008 7:45 AM
    • Edited by ProfileUser2 Sunday, June 15, 2008 7:17 PM Edit
    Monday, June 09, 2008 10:12 AM
  • The primary difference is that with a SortedDictionary<,> the items are ordered by the keys. In a regular Dictionary<,> there is no defined order of the keys (not even insertion order is guaranteed). Dictionary<,> should be as fast as any others for by-key (equality) query.


    A SortedDictionary<,> has overheads compared to a Dictionary<,>, making it slower. A comparison between SortedList<,> and SortedDictionary<,> would be far more meaningful - and yes, then the difference is in insertion time vs query time.


    Marc
    • Edited by Marc GravellMVP Monday, June 09, 2008 10:27 AM added equality query
    Monday, June 09, 2008 10:23 AM
  • Thanks Delimarschi Denis,


    Mentioned here,

    http://msdn.microsoft.com/en-us/library/xfhwa508(VS.80).aspx

    "Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table." Looks like the query time for Dictionary is O(1), but the query time for SortedDictionary is O(logn) -- "The SortedDictionary generic class is a binary search tree with O(log n) retrieval, where n is the number of elements in the dictionary. In this respect, it is similar to the SortedList generic class. The two classes have similar object models, and both have O(log n) retrieval. Where the two classes differ is in memory use and speed of insertion and removal."

    http://msdn.microsoft.com/en-us/library/f7fta44c(VS.80).aspx

    I am very confused, looks like Dictionary query is faster than SortedDictionary?


    regards,
    George
    Monday, June 09, 2008 10:28 AM
  • Hi Marc!


    Thanks. I agree. I read some further documents about the time complexity compared with query in Dictionary and query in SortedDictionary, looks like query in SortedDictionary is slower? (please refer to my post above.) Any ideas or comments?


    regards,
    George
    Monday, June 09, 2008 10:29 AM
  • Well, I'd agree ;-p

    I would expect Dictionary<,> to be the most lightweight, and fastest at both retrieval and insertion.

    If you don't require the data to be sorted (for whatever reason), then stick to Dictionary<,>.

    Of the other two; SortedList<,> uses less memory; SortedDictionary<,> has faster removal and insertion for unsorted data - but if the data is pre-sorted then SortedList<,> has faster insertion.

    But this only applies if you need it sorted! Since you are comparing to Dictionary<,> I can only assume you *don't* fundamentally need the data sorted in any particular way; in which case use Dictionary<,>
    Marc
    Monday, June 09, 2008 11:24 AM
  • Great Marc!


    So,

    1. For insert/delete/query (just query a value by an input key), comparing SortedDictionary and Dictionary, Dictionary is always faster?

    2. I do not need the elements to be sorted, but I am interested to learn from Guru like you about why do we need the elements to be sorted by key field? Could you show me a scenario please? :-)


    regards,
    George
    Monday, June 09, 2008 11:40 AM
  • Yes, Dictionary<,> will be quicker when fetching things by key. And in general we *don't* need the data sorted by key.

    There are, however, some scenarios when having the data sorted is also useful. For example, if the key is a DateTime, we might want to fetch all the values in a given range (say: Jun 2008 to Jul 2008). We can do this (among other ways) by finding the index of the first key, and then reading data until we find the next item is outside the range. We've done one "lookup" using the dictionary, the necessary number plus one of "fetch" operations, and nothing else. If the data is unsorted we'd need to check every single item separately, which is painful if the list is long.

    Sorted data also lends itself to things like efficient joining (where you can iterate a left and right pair in parallel), and hunting for data using binary search (halving the range each time) - even if the value we want isn't in the list (picking the next or last node instead). Dictionary<,> cannot give any information about values not in the list, where-as with a sorted list you can say "it isn't there, but if it was, it would be around index 34".

    Of course, it is also useful if you know that you want to display the data in order at a later point in time, and don't want to have to sort it more than once...


    Marc
    • Marked as answer by Hanzheng Zou Thursday, June 12, 2008 7:45 AM
    Monday, June 09, 2008 11:56 AM
  • Thanks Marc,


    I did some test cases and find some articles to read, especially, here.

    http://www.c-sharpcorner.com/UploadFile/mgold/SortedDictionarySample06202007142815PM/SortedDictionarySample.aspx

    A couple more comments,

    1.

    In order to access the sorted output of the SortedDictionary, besides using foreach, are there any other approaches?

    2.

    How to control sorting is important. Using IComparer or IComparable are good ways to control the sort. I do not know what is the default sorting approach if we do not use the 2 compare solutions? (I am not sure the default sorting will use some binary magic fuzzy value, which we can not control order)?

    3.

    --------------------
    We can do this (among other ways) by finding the index of the first key, and then reading data until we find the next item is outside the range. We've done one "lookup" using the dictionary, the necessary number plus one of "fetch" operations, and nothing else.
    --------------------

    Could you show some pseudo code please?

    4.

    --------------------
    hunting for data using binary search
    --------------------

    Could you show some pseudo code please?


    regards,
    George
    Tuesday, June 10, 2008 1:53 AM
  • 1: foreach is the route; either on the dictionary itself (for KeyValuePair<,>), or on Keys or Values for just the keys/values.

    2: Both SortedList<,> and SortedDictionary<,> accept an IComparer<TKey>, which you can construct for a custom sort. Otherwise it uses Comparer<TKey>.Default, which will look for at TKey for IComparable<TKey>, or just IComparable as a fallback. If TKey supports neither of these and no custom comparer is provided, it will throw. Similarly, Dictionary<,> accepts an IEqualityComparer<TKey>, otherwise using EqualityComparer<TKey>.Default, which uses IEquatable<TKey> (if found), defaulting to the inbuilt object.Equals / object.GetHashCode() otherwise.

    So if you don't provide anything yourself, the ordinal sort for the keys will be used.

    3:

                SortedList<stringstring> records = new SortedList<string,string>();
                // could be added in any order... this is just easier
                records.Add("c""cello");  
                records.Add("f""flute");  
                records.Add("g""guitar");  
                records.Add("h""harmonica");  
                records.Add("v""violin");  
     
                // find everything between f and k  
                int index = records.IndexOfKey("f");  
                string key;  
                while (records.Comparer.Compare((key = records.Keys[index++]), "k") <= 0)  
                {   
                    Console.WriteLine("{0}={1}", key, records[key]);  
                } 

    4: binary search takes a while to write - but to illustrate:

    pick a number between 1 & 100; pretend you chose 72, but I don't know that...

    so; between 1 and 100; take a midpoint (50); is it 50, <50 or >50? ahh... >50
    so; between 50 and 100; take a midpoint (75); is it 75, <75 or >75? ahh... <75
    so; betweeb 50 and 75; take a midpoint (63); is it 63, <63 or >63? ahh >63
    so; between 63 and 74; take a midpoint (69).... ahh > 69
    so; ... 69 and 74 - try midpoint 72 - found your number!

    So I did that in 5 iterations, instead of trying "is it 1? is it 2? is it 3?...", but it only works if the data is ordered and uniformly distributed.
    Marc
    Tuesday, June 10, 2008 6:58 AM
  • Hi Marc!


    You are so knowledgeable!

    A few more comments,

    1.

    Good to learn Keys array and Values array is automatically sorted! Cool!!

    2.

    I think it is better to provide a customized comparer, since default one is always fuzzy and may sorted in an undesired way, agree?

    4.

    I am confused about the binary search sample. What are the keys (and type) and values (and type) of your sample? 72 is the key or value or index to find?


    regards,
    George
    Wednesday, June 11, 2008 1:05 PM
  • Re 2: I'd disagree. In most typical cases the ordinal order (i.e. how the type itself orders itself via CompareTo) is fine and expected (not fuzzy at all). The main occasion I use a non-default comparer is to provide specific string comparers - case insersitive, invariant, etc.

    4: I'm talking about keys throughout this; and this isn't a typical scenario - but it is possible... but don't lose sleep trying to understand binary search in the context of a (sorted) dictionary/list.
    Marc
    Wednesday, June 11, 2008 2:25 PM
  • Thanks Marc!


    1.

    For string and int as key, using default comparer is fine, but if we use a customized type as a key, I think the customized comparer is needed since the default comparer of a customized type is to compare the reference (pointer) value?

    2.

    In your binary search sample, I think you want to choose a key whose value (of the key itself, not the value of the key/value pair) is 72?


    regards,
    George

    Thursday, June 12, 2008 8:27 AM
  • 1: The default *equality* comparison (used in things like List<>, Dictionary<,>, etc) is to use the object's own Equals() and GetHashCode(), which uses reference equality. This is usually fine, and can be changed simply by tweaking Equals()/GetHashCode(), or [better] implementing IEquatable<T> on the instance.
    For SortedList<,> etc, the *inequality* comparison is used; this never uses anything to do with references, and a default simply isn't defined unless your type implements IComparable / IComparable<T> - which you can verify as so:

    class SomeType {  
        static void Main()  
        {  
            IComparer<SomeType> defComparer = Comparer<SomeType>.Default;  
            SomeType a = new SomeType(), b = new SomeType();  
            // ArgumentException: At least one object must implement IComparable.  
            int c = defComparer.Compare(a, b);  
        }  

    The advantage of properly implementing IComparable / IComparable<T> (rather than just supplying a custom IComparer / IComparer<T>) is that it will work automatically in all situations, such as sorting a list (perhaps in a UI) etc. Another example: LINQ provides Min/Max extension methods that rely (by default) on the default comparer - sure, you can provide your own, but this gets very tricky if your code is at the bottom of a pile of generics - how do you know what custom comparer to supply if all your code knows is "T"? But it is trivial to use the default comparer.

    2: I wouldn't get too concerned about the specific example - the point is that there are some things that only work on sorted data, and this is one of them. Again - I'm talking about keys. If I just wanted to know about the item with key 72, then it would be "TValue val = dict[72];" - the point about binary search (in this example) is that it would tell me where it would be *if* it existed (even if it doesn't exist). But maybe binary search is a bad example here... honestly: don't stress over this!


    Marc
    Thursday, June 12, 2008 8:59 AM
  • Thanks Marc,


    1. *inequality* comparison -- mean less than compare, greater than compare in order to find insert location for SortedList?

    2. Could you show me a sample why IComparable / IComparable<T> is inferior compared with IComparer / IComparer<T>, better with some pseudo code? After reading your description, I still feel confused. :-)


    regards,
    George

    Thursday, June 12, 2008 1:00 PM
  • 1: correct
    2: other way around; I'm saying that it is preferable to implement IComparable / IComparable<T> if there is a sensible order for ordering items, only using IComparer / IComparer<T> as a fallback to provide specific ordering.

    For why - let me illustrate with generics and a few layers of abstraction. Down at SomeOtherGenericMethod, it has no way of knowing about any specific comparer to use. We could arguably pass one in as an argument, but then we need to pass that argument through every layer in the call stack, which could be several layers and not very convenient. But by using the default comparer (i.e. IComparer / IComparer<T>) it just works:

    using System;  
    using System.Linq;  
    using System.Collections.Generic;  
    class Foo : IComparable<Foo>, IComparable  
    {  
        public int Key { getset; }  
        public int CompareTo(Foo foo)  
        {  
            return foo == null ? -1 : Key.CompareTo(foo.Key);  
        }  
        int IComparable.CompareTo(object obj)  
        {  
            return CompareTo(obj as Foo);  
        }  
        public override string ToString()  
        {  
            return "Foo " + Key;  
        }  
    }  
    static class Program  
    {  
        static void Main()  
        {  
            List<Foo> foos = new List<Foo>   
            {  
                new Foo { Key = 3}, new Foo {Key = 9},  
                new Foo { Key = 1}, new Foo {Key = 4}  
            };  
            SomeGenericMethod(foos);  
        }  
        static void SomeGenericMethod<T>(IEnumerable<T> data)  
        {  
            // .. do some work  
            // .. do some more work  
            SomeOtherGenericMethod<T>(data);  
        }  
        static void SomeOtherGenericMethod<T>(IEnumerable<T> data)  
        {  
            Console.WriteLine("Min: {0}", data.Min());  
            Console.WriteLine("Max: {0}", data.Max());          
            List<T> items = data.ToList();  
            items.Sort();  
            foreach (T item in items)  
            {  
                Console.WriteLine(item);  
            }          
        }  
     

    Marc
    • Marked as answer by George2 Monday, June 16, 2008 3:02 AM
    Thursday, June 12, 2008 1:38 PM
  • Thanks Marc,


    Question answered. I understand your design pattern now. It is more extensible.


    regards,
    George
    Monday, June 16, 2008 3:04 AM
  • I'm struggling to find a way to do something that seems like it should be obvious with a SortedList/SortedDictionary.

    Given a value that is NOT in the list, I'd like to find the value above (next) or below (prev) the value, using the built-in binary search that it performs. Currently I only see how to lookup exact values.

    It is there a way to do the equivilent of "FindNext(value)" and "FindPrev(value)" that efficiently uses the binary search code inside those implementations?

     
    SortedDictionary<string,sting> dict; 
     
    dict["abc"] = 1; 
    dict["def"] = 2; 
    dict["ghi"] = 3; 
     
    string value = dict["abc"];   //  this is easy. 
     
    KeyValuePair<string,string> record = dict.FindNext("b");  // this is what I want... 
     

    Sunday, June 29, 2008 9:24 AM
  • Neither exposes that functionality "out of the box". This was just raised as one of the things that sorted data can be used for; in particular, it relies on a "uniform" distribution of keys, which isn't a pre-condition of sortedlist/sorteddictionary.

    It wouldn't be hard to add, however; but unless you now you *need* the complexity of writing a binary search,  you could start with foreach. Note also that there is an Array.BinarySearch that you might be able to exploit (by using Keys.CopyTo to fill an array of the keys). But if you are calling this often it would be worth re-implementing using the data directly rather than lots of copying.
    Marc
    Sunday, June 29, 2008 8:21 PM
  • There seems to be some confusion in this thread about the speed of a Dictionary vs. SortedDictionary vs. SortedList.  The documentation in MSDN explains exactly how the insert and retrieval routines in each class perform, so no real research is required. 

    The Dictionary class should almost always outperform any of the other classes because it implements a hash table, so both insertion and retrieval are close to O(1) algorithms.  The other algorithms are O(LogN) for retrieval, and vary for insertion.

    One BIG thing not mentioned in this thread that is something to consider is the added memory cost of a Dictionary, which can be substantial.  There aint no such thing as a free lunch, and the cost of the fast performance that a dictionary offers is memory consumption.  Hash tables require allocating lots of memory to act as so-called hash-buckets.  Many of the buckets will just be empty.  Anyone interested in the details of why a Dictionary is costly in terms of memory consumption can search the web for "Hash Table implementation" and read-up on how to write a hash table.  There are plenty of implementations out there.

    Also, the speed of insertion and retrieval is "about O(1)" but its generally going to be more than O(1) due to "hash clashes" or collisions, which happen when two different keys have the same hash value.  Depending upon the key, a hash clash can happen, and when it does, the dictionary will handle it gracefully, but performance will be worst than O(1).

    Wednesday, July 24, 2013 1:06 AM