locked
why is there no hashset / list in System.Collections.Concurrency? RRS feed

  • Question

  • Why is there no hashset / list type class in System.Collections.Concurrency? Is there a thread-safe version elsewhere?
    Thursday, May 12, 2011 4:36 PM

Answers

  • Yes.  A HashSet, technically, doesn't require a value - every object is effectively a key.  I agree that a parallel hash set would be useful, but you can easily just handle this via a concurrent dictionary with a byte as the value, as it'll have the same semantics as a HashSet.  That being said, see below...

     

    For parallel operations, however, things get a little different.  The problem is in the usage of these collections.  When working with parallel "lists" or "hash sets", synchronization is nearly always required, which, btw, is why I suspect they weren't included.

     

    First off - take List<T>.  List<T> is effectively an array which you can add items to.  "Removing a particular item" from the list, in a parallel situation would require locking of the collection in any case, as multiple threads trying to manipulate the list using list semantics would run into problems if one removes while another simultaneously modifies.  The same goes for adding items - List<T> is used with adding items with an expectation of ordering, but in parallel code, the ordering is no longer guaranteed...  Granted, a single linked list can be done fairly easily, but the situations where you'd want that can also typically be handled via ConcurrentStack or ConcurrentQueue (as they do implement ICollection<T>, and do have Remove methods...).  If you just need the array semantics, arrays tend to work just fine in parallel situations.

    HashSet<T> is even trickier - a HashSet is typically used to determine whether something has happend or not - and again, you run into issues without additional synchronization in place if that's your goal.  Two threads, where one is adding to the set and another is checking Contains() simultaneously will get unknown behavior...  ConcurrentDictionary<T,U> handles this by allowing both threads to create items, and only adding the one that gets there first - but this is a bit of a different scenario than a HashSet<T> implementation.

     

     


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".
    • Marked as answer by Ben E D Ellis Thursday, May 12, 2011 11:23 PM
    Thursday, May 12, 2011 10:54 PM

All replies

  • You typically would use ConcurrentBag<T> (list) or ConcurrentDictionary (hashing) for this.

     

     


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".
    Thursday, May 12, 2011 5:51 PM
  • Bag != List and Dictionary != Hashset

    They are different data structures and behave differently.

    I.E.

    • Bag is Unordered and you can't remove a particular item from the list (just a random item).
    • A Hashset doesn't require a key.
    Thursday, May 12, 2011 10:29 PM
  • Yes.  A HashSet, technically, doesn't require a value - every object is effectively a key.  I agree that a parallel hash set would be useful, but you can easily just handle this via a concurrent dictionary with a byte as the value, as it'll have the same semantics as a HashSet.  That being said, see below...

     

    For parallel operations, however, things get a little different.  The problem is in the usage of these collections.  When working with parallel "lists" or "hash sets", synchronization is nearly always required, which, btw, is why I suspect they weren't included.

     

    First off - take List<T>.  List<T> is effectively an array which you can add items to.  "Removing a particular item" from the list, in a parallel situation would require locking of the collection in any case, as multiple threads trying to manipulate the list using list semantics would run into problems if one removes while another simultaneously modifies.  The same goes for adding items - List<T> is used with adding items with an expectation of ordering, but in parallel code, the ordering is no longer guaranteed...  Granted, a single linked list can be done fairly easily, but the situations where you'd want that can also typically be handled via ConcurrentStack or ConcurrentQueue (as they do implement ICollection<T>, and do have Remove methods...).  If you just need the array semantics, arrays tend to work just fine in parallel situations.

    HashSet<T> is even trickier - a HashSet is typically used to determine whether something has happend or not - and again, you run into issues without additional synchronization in place if that's your goal.  Two threads, where one is adding to the set and another is checking Contains() simultaneously will get unknown behavior...  ConcurrentDictionary<T,U> handles this by allowing both threads to create items, and only adding the one that gets there first - but this is a bit of a different scenario than a HashSet<T> implementation.

     

     


    Reed Copsey, Jr. - http://reedcopsey.com
    If a post answers your question, please click "Mark As Answer" on that post and "Mark as Helpful".
    • Marked as answer by Ben E D Ellis Thursday, May 12, 2011 11:23 PM
    Thursday, May 12, 2011 10:54 PM
  • Thanks for the explanation. I was using Hashset to determine if two threads held locks on various in-memory variables, so I was using it in the way you described.
    Thursday, May 12, 2011 11:25 PM