locked
Dictionary: Insert if doesn't exist without double lookup RRS feed

  • Question

  • I use the following code to make sure that previous value can't be ovewritten i.e., if there are duplicates then the first value must be used.

    if (!dict.TryGetValue(myKey, out prevVal))
    {
        dict[myKey] = myVal;
    }

    The problem is that it requires two dictionary lookup-s instead of one.

    Is there any way to do it with a single lookup using .NET collection classes?

    Friday, March 20, 2009 11:50 PM

Answers

  • I agree - I also like TryAdd. I usually define its equivalent as an extension method for my projects, along with a Lookup that returns default instead of throwing if not found. Duplicates in .NET keyed collections are a real pain; you usually end up with a dictionary of lists or something similar, and really have to pay attention to the implied O() for each operation.

    The semi-standard hash_map was designed as a drop-in replacement for map. The speed benefit of hash_map.insert would be negligible, so I believe it is just supported for compatibility.

    Your last sentence is a little confusing to me. If you're doing any processing at all, then linearly growing time is pretty much the best you can hope for...

    If you really want full control over your data structures (the built-in C# ones are rather confining), then get PowerCollections. It's a little dated but still the best around.

           -Steve

    • Marked as answer by Zhi-Xin Ye Thursday, March 26, 2009 10:16 AM
    Monday, March 23, 2009 3:15 PM

All replies

  • If you are sure the key is not in the Dictionary (as you are in your case since you guarded with TryGetValue), you should use the Dictionary.Add method instead of the [] syntax.

    Nonetheless, there will still be somewhat of a double lookup in the sense that Add will detect if the key already exists and throw an exception if it does.  For a Dictionary, this should not have much impact on your performance because the Dictionary has to figure out which hash slot to use and the algorithm is the same regardless of whether or not the key already exists.  It is just a matter of throwing an exception if the slot is not vacant; Dictionary should be able to do this extremely efficiently.

    In other words: Don't worry about it! :)

    Friday, March 20, 2009 11:55 PM
  • My concern is about double lookup,  not about the operator[] . In C++ world STL map container has insert() method that allows to avoid double lookup:
    pair<iterator,bool> insert ( const value_type& x ); 
    The pair::second element in the pair is set to true if a new element was inserted or false if an element with the same value existed.

    I can avoid double lookup if I use Add() and then catch and ignore ArgumentException for duplicates. But it's plain ugly because it complicates code (try catch etc) and inefficinet (for small dictionaries the cost of second lookup is probably less than the cost exception handling).

    What I'm looking for is bool TryAdd(TKey,TValue)  method. 

    Saturday, March 21, 2009 1:15 AM
  • In this case, where the object of type TValue is known ahead of time (i.e., it is not allocated/computed after checking if the key exists), I often just do the assignment:

    dict[myKey] = myVal;

    The first case is that myKey already exists.  The value is replaced with myVal, which is cheap, usually isn't a big deal, and the replacing object is identical or equivalent to what was already in the Dictionary.  The second case is that myKey does not exist, so it is added.

    This logic is not correct for all cases, but I have found that it works for many cases.

    Saturday, March 21, 2009 2:17 AM
  • The simple assignment (dict[myKey] = myVal;) will not do: the values are not equivalent and I need to keep the first one, not the last one.
    Saturday, March 21, 2009 7:38 AM
  • Erm, there is no double lookup.  You need a single lookup to verify if the key was already added.  You can make it look like a framework method by using the extension method syntax:

      public static class Extensions {
        public static void TryAdd<K, V>(this Dictionary<K, V> dict, K key, V value) {
          if (!dict.ContainsKey(key)) dict.Add(key, value);
        }
      }


    Hans Passant.
    Saturday, March 21, 2009 2:06 PM
  • If I put all the code in a single extension method, then only syntactically it will look like one call. Still the missing TryAdd() method will be 2X slower than it could be because the data structure is searched twice 1. by ContainsKey() 2. by Add().

    Really surprised by such an oversight  in the design of  fundamental .NET data structure
    Sunday, March 22, 2009 2:37 PM
  • Lookup is O(1).  Looking up twice is still O(1).  Complicating the API for something that is not commonly used and is still fast was probably deemed unnecessary.  Post suggestions to connect.microsoft.com.  Or write your own.
    Hans Passant.
    Sunday, March 22, 2009 4:14 PM
  • Lookup is amortized O(1), worst-case O(n). This is the main difference between C#'s Dictionary and C++'s map. map's lookup is O(log n), worst-case and average.

    This is why there is no Dictionary.TryAdd: it doesn't save as much as insert does for map.

    The performance of the C# Dictionary, like many other C# data structures, is entirely dependent on the hashing function used. With a good hash, it will outperform map in general, but with a bad hash (or just the wrong data set) it will perform much worse than map.

    Check out PowerCollections; they had found the standard (CLR) hash function to have poor distribution, so they did their own when they wrote their tree-based data structures.

            -Steve
    Monday, March 23, 2009 12:47 PM
  • MSVC C++ hash_map is also hash table based and has insert() that returns <iterator, bool>:
    http://msdn.microsoft.com/en-us/library/th79x793(VS.71).aspx

    I don't understand the argument that O(1) average cost of hashtabe lookup is the reason for not including TryAdd(), which I think is easier to understand and use than Add().

    Add() throws exceptions. The use exceptions to control normal code flow is not recommended, so Add() is not suitable if duplicates are expected.

    I have large datasets that I process using Dictionary; it takes time => 2X diference is noticable even if the total time still grows linearly with the number of input objects (provided the good hash function is used).

    Monday, March 23, 2009 2:14 PM
  • I agree - I also like TryAdd. I usually define its equivalent as an extension method for my projects, along with a Lookup that returns default instead of throwing if not found. Duplicates in .NET keyed collections are a real pain; you usually end up with a dictionary of lists or something similar, and really have to pay attention to the implied O() for each operation.

    The semi-standard hash_map was designed as a drop-in replacement for map. The speed benefit of hash_map.insert would be negligible, so I believe it is just supported for compatibility.

    Your last sentence is a little confusing to me. If you're doing any processing at all, then linearly growing time is pretty much the best you can hope for...

    If you really want full control over your data structures (the built-in C# ones are rather confining), then get PowerCollections. It's a little dated but still the best around.

           -Steve

    • Marked as answer by Zhi-Xin Ye Thursday, March 26, 2009 10:16 AM
    Monday, March 23, 2009 3:15 PM
  • I agree, this is not good to do double look-ups. I hope, Microsoft will change its mind eventually.

    For now, try using HashSet<KeyValuePair> instead of Dictionary<> with a custom equality comparer that compares KeyValuePair.Key values. If the item is already there, HashSet.Add() will not replace original item but will return false instead.
    You can go further and create your own 
    HashSet-based dictionary class implementing TryAdd() as described above.

    Wednesday, June 8, 2011 4:52 PM