none
List vs HashSet RRS feed

  • Question

  • Hi,

    I currently have a code working with a list that acts as a set (before every add I check contains), the rqeuirements from this set is pretty much all in all, it gets inserted into frequently, removed from frequently and iterated on frequently. This set is the value in a hashtable which contains many similar lists, some are extremely small (1-4 items in the set) to very big (100000+). I am currently facing a problem with removing items from the set, on larger sets due to the amount of items the contains takes a long time and gives a very bad performance result.

    I am considering changing to HashSet, I know of the quite large memory overhead i'll be getting which is unfortunate and while I am limited due to older machines, it is a overhead i'm willilng to accept. I looked into lowering the memory footprint of HashSet and didn't find any way to do so. I also made some performance checks and saw i'll be getting a overhead in insert (about 130% of the current input) and in the iteration a small overhead (less than 10%), the delete performance however increases by a huge percent (delete from about a 100000 set which took around 3500ms now takes 13ms). Now after all this my actual question, is there any other parameter I should be expecting to see changed? have I missed something in the memory overhead, is there a way to lower it? Do you recommend any other data type for my use?

    Thanks!

    Gil

    Tuesday, January 27, 2015 5:57 AM

Answers

  • I would use a Dictionary instead of a Hash.  The key in the dictionary uses a built in hash function.  You can add a dictionary to your list very easily with Linq.  The dictionary value is a pointer to the list.  So the addional overhead for the Dictionary is the number of items in the List * (size of key + size of pointer to value).


    jdweng

    Tuesday, January 27, 2015 11:45 AM
  • I suggest making similar investigation with SortedSet<T> class (with and without IComparer parameter) and compare it with HashSet<T>.

    Tuesday, January 27, 2015 2:22 PM

All replies

  • I would use a Dictionary instead of a Hash.  The key in the dictionary uses a built in hash function.  You can add a dictionary to your list very easily with Linq.  The dictionary value is a pointer to the list.  So the addional overhead for the Dictionary is the number of items in the List * (size of key + size of pointer to value).


    jdweng

    Tuesday, January 27, 2015 11:45 AM
  • I suggest making similar investigation with SortedSet<T> class (with and without IComparer parameter) and compare it with HashSet<T>.

    Tuesday, January 27, 2015 2:22 PM