locked
Implementing a cache with ConcurrentDictionary.TryGetOrAddValue()?

    Question

  • I have written an experimental cache object that is (supposed to be) thread safe. The cache has one public method that receives a key and a Func<TValue>. If an item corresponding to the key is found it is retrieved from the cache, otherwise the delegate is called to produce the item and it is stored in the cache for future use. I have found this pattern to be usefull in many scenarios, so I would like to create a generic, scaleable and robust implementation using the CDS.

    My current implementation uses a generic (non-concurrent) dictionary and a ReaderWriterLockSlim to synchronize. I used LazyInit<T> to make sure the lock for the cache can be released before the delegate that produces the actual value is executed. (LazyInit indeed is very cool)

    public class CacheDictionary<TKey, TValue>  
        where TValue : class // needs to be a ref type due to current limitation of lazyInit<>  
    {  
        ReaderWriterLockSlim _cacheLock = new ReaderWriterLockSlim();   
        Dictionary<TKey, LazyInit<TValue>> _cacheItemDictionary = new Dictionary<TKey, LazyInit<TValue>>();   
     
        public TValue Fetch(TKey key, Func<TValue> producer)  
        {  
            LazyInit<TValue> cacheItem;  
            bool found;   
     
            // first we try to get the item with only a read lock,  
            // this way once the item is present we have concurrent access  
            _cacheLock.EnterReadLock();  
            try 
            {  
                found = _cacheItemDictionary.TryGetValue(key, out cacheItem);  
            }  
            finally 
            {  
                // now release the ReadLock BEFORE acuiring the WriteLock to avoid deadlocks  
                _cacheLock.ExitReadLock();  
            }   
     
            if (!found)  
            {  
                // it was not found so now we get a Write Lock  
                _cacheLock.EnterWriteLock();  
                try 
                {  
                    // check if the item was added between releasing the read en acuiring the write  
                    if (!_cacheItemDictionary.TryGetValue(key, out cacheItem))  
                    {  
                        cacheItem = new LazyInit<TValue>(()=> producer(), LazyInitMode.EnsureSingleExecution);                          
                        _cacheItemDictionary.Add(key, cacheItem);  
                    }  
                }  
                finally 
                {  
                    _cacheLock.ExitWriteLock();  
                }  
            }   
     
            // The producer delegate is invoked as needed by the Value property,   
            // after the locks for the entire cache have been released  
            return cacheItem.Value;  
        }  
    }   
     
     

    I would really like to replace all my locking code (not sure it's 100% correct and maximal efficient to be honest) by using for instance the new ConcurrentDictionary<>. But I'me not sure if thats psoosible and how to do this.

    To replace the double check locking I would like to perfrom an atomic operation on the dictionary that retrieves an item if it exists and store a new item if not. Something like a combination of TryAdd and TryGetValue. Better yet would be if I coud also pass in a producer Func<TValue> and have it executed inside the lock as needed.

    Is there a way to get ConcurrentDictionary to execute these actions as one atomic operation? Or could you maybe point me in a different direction to solve this? Of course I wouldn't mind you just adopting the whole pattern for the next CTP :-)

    Saturday, December 20, 2008 12:50 AM

Answers

  • Hi Frank,

    Thanks for the feedback on ConcurrentDictionary<TKey,TValue>.  I'll have the team evaluate your suggestion of a TryGetOrAdd API.  In the mean time, might I suggest using a SpinWait and then reverting to reinitialization after too many waits?  You could try something along the lines of ...

    ...
    var spinWait = new SpinWait();
    for ( i = 0; i < MAX_TRY_COUNT; ++i)
    {    
          if (!cacheItemDictionary.TryAdd(key, cacheItem))
          {
               LazyInit<TValue> cacheItem2;
               if ( !cacheItemDictionary.TryGetValue(key, out cacheItem2) )
               {
                    spinWait.SpinOnce();
                    continue;
               }
               else 
               {
                    return cacheItem2.Value;  
               }
           }
    }
    return cacheItem.Value;

    Hope that helps,
    Josh
    • Marked as answer by Frank Bakker Saturday, January 3, 2009 5:44 PM
    Friday, January 2, 2009 10:55 PM

All replies

  • Having a few days off to think about this I came up with the following solution myself.

        public class CacheDictionary<TKey, TValue>  
            where TValue : class // needs to be a ref type due to current limitation of lazyInit<>  
        {  
            ConcurrentDictionary<TKey, LazyInit<TValue>> _cacheItemDictionary = new ConcurrentDictionary<TKey, LazyInit<TValue>>();  
     
            public TValue Fetch(TKey key, Func<TValue> producer)  
            {  
                LazyInit<TValue> cacheItem;  
                if (!_cacheItemDictionary.TryGetValue(key, out cacheItem))  
                {  
                    cacheItem = new LazyInit<TValue>(() => producer(), LazyInitMode.EnsureSingleExecution);  
     
                    if (!_cacheItemDictionary.TryAdd(key, cacheItem))  
                    {  
                        // while we never remove items, if TryAdd fails it should be present  
                        cacheItem = _cacheItemDictionary[key];  
                    }  
                }  
                return cacheItem.Value;  
            }  
        } 

    I am not compeltely sure I understand the semantics of the Try.. methods correct. But if I am correct the TryAdd wil return false only if the item was allready found in the dictionary. Therfore I can assume it is present while no other threads will be removing items in this scenario. In a scenario that supports cache expiration I would need to do some extra work here, I'm not quite sure yet how to do this.

    The downside of this implementation is that in some cases I will be creating the LazyInit object which I will never use. I try to avoid this by first checking if it allready exists. Just like in my original implementation the dictionary is checked 3 times for the presence of the item (in worst case) In large dictionaries this could be an issue.

    Can you spot any flaws in my logic here, or maybe hint on further optimisation?
    Monday, December 29, 2008 2:56 PM
  • Hi Frank,

    This looks correct to me and it does look as if you've got the behavior of the Try* methods down pat.  

    This is a really interesting use of LazyInit! 

    Have you run any perf/correctness tests on it?

    Josh
    Monday, December 29, 2008 8:21 PM
  • Hey Yosh, thanks for the reply,

    I had built some unit tests for the original implementation that I successfully re-run on the new implementation. These tests do check for things like the producer being called only once per unique key, and the retrieval of an item not blocking on the producer of another key.

    I have not done any real performance testing on the cache itself, the primary goal of this cache is avoiding expensive recreation of items (typically results from webservice calls) without serializing the actual requests for different items on a single lock. Typically the webservice request will be more expensive than the overhead of retrieving or storing items in the cache, it would ofcourse be nice to have the overhead as small as possible.

    Testing for race conditions ofcouse is still not easy. By letting the framework do most of the synchronization for me I do sleep a little better at night, but the remaining code however is not completely trivial and still requires manual inspection for all possible scenario's to be correct. Maybe tools like Chess would come in handy here, but I have not looked into this yet.

    This just leaves me with the issue of what to do if the cache does allow for the removal of items from the dictionary. What if TryAdd fails and I do a TryGetValue which fails due to a removal by another thread. I could do another TryAdd start over again in an hypothetical infinite loop. In this situation a combined TryGetOrAddValue would come in handy and could maybe allow for a more efficient implementation of the dictionary by only traversing the hashtable once and requiring only a single wait for a lock.

    In case anyone is interested, I wrote a little piece on my blog about this pattern. Hope you don't mind me linking it here

    http://blogs.infosupport.com/blogs/frankb/archive/2008/12/31/Implementing-a-Thread-Safe-cache-using-the-Parallel-Extensions.aspx

    Tuesday, December 30, 2008 8:46 AM
  • Hi Frank,

    Thanks for the feedback on ConcurrentDictionary<TKey,TValue>.  I'll have the team evaluate your suggestion of a TryGetOrAdd API.  In the mean time, might I suggest using a SpinWait and then reverting to reinitialization after too many waits?  You could try something along the lines of ...

    ...
    var spinWait = new SpinWait();
    for ( i = 0; i < MAX_TRY_COUNT; ++i)
    {    
          if (!cacheItemDictionary.TryAdd(key, cacheItem))
          {
               LazyInit<TValue> cacheItem2;
               if ( !cacheItemDictionary.TryGetValue(key, out cacheItem2) )
               {
                    spinWait.SpinOnce();
                    continue;
               }
               else 
               {
                    return cacheItem2.Value;  
               }
           }
    }
    return cacheItem.Value;

    Hope that helps,
    Josh
    • Marked as answer by Frank Bakker Saturday, January 3, 2009 5:44 PM
    Friday, January 2, 2009 10:55 PM
  • I think that'll generally do the trick, chances that it'll require more than a few itterations will be small. On the other hand I might prefer to go back to the original implementation with the standard Dictionary and the ReaderWriterLockSlim, or maybe a combination of these two patterns.

    I have read / heard about SpinWait, can you explain how it helps here?
    Saturday, January 3, 2009 5:56 PM
  • Hi Frank,

    SpinWait just gives your for-loop a little breathing room before it hits the next iteration, which can help reduce contention.  Also, if you end up using a lot of spins, SpinWait has some counting logic that will eventually fall back to a true Yield instead of busy waiting. 

    Also, on a separate note, a coworker pointed out an issue with your code snippet above that I had overlooked.  In the two public CTPs that we have released LazyInit<T> is a struct.  Unfortunately, this means that everytime a LazyInit<T> is passed around (even when passed in our out of a method), it's getting copied.  So, if you store an uninitialized LazyInit<T> in your dictionary, which I belive you're doing, it will be reinitialized every time that LazyInit<T> instance is retrieved. 

    We made LazyInit<T> a struct for performance and reduced memory footprint but we quickly realized that it can be unusable in situations such as this.  Therefore, for future releases, we've split LazyInit into two types: LazyInitField<T> and LazyInit<T>.  The former is a struct, much like the LazyInit<T> you are using now and the latter is a class, which, while larger in size, doesn't suffer from the issue I've described. 

    Hope that helps a bit.

    Josh 
    Monday, January 5, 2009 6:56 PM
  • Thanks again for the information, I had not realized the implication of LayInit being a struct. Maybe I should check my Unit Tests again to see why this problem was not found. If all goes as planned , my code will 'automatically' be fixed when you guys replace the struct to a class with the same name. In my original implementation with the readerWriterlock (which works on .Net 3.5 without PFX) I use my own implementation of LazyInit which is actually is a class so this was not a problem.

    I watched a Channel 9 video last week on transactional memory in which the issue of doing two or more operations on a ConcurrentDictionary, or any other ConcurrentCollection as one atomic action was also discussed. If I understand it correct, my problem with combining the TryGet and the TryAdd falls into that category. I can understand that fixing this more general problem will be a lot more complex.

    One possible solution would be to publicly expose some way to obtain an exclusive lock on the dictionary, do my peeking and poking and release the lock. I think this would have already been under consideration but will have some problems itself.
    Friday, January 9, 2009 9:36 PM