none
Better Datastructure then Hashtable? RRS feed

  • Question

  • Hi,

    I have a webservice which stores millions of records in nested hashtable. This hashtable is consistenly being updated from background threads (multiple records every millisecond from MSMQ) and records are read from it during any client requests.

    Structure of hashtable - Key(Code), Value(ChildHashtable). ChildHashtable - Key(Id), Value(Object). ChildHashtable could have thousands of records for a single code and there are thousands of codes. So at a point there can be about 10 million recods in Hashtable in total.

    Whenever there is a request from client this Hashtable has to be enumerated. Client request will be for a specific code, such that:

    Do a calculation logic on all the records for a specified code. Now the complex part is that this code could have a parent code and the parent code could have further child codes. So we need to get the records of entire tree i.e. Records for all Code+ParentCode+Parent'sChildrenCodes.

    This webservice is currently written in 1.1 and we are moving it to 2.0. Please let me know if there are any better datastructures in 2.0. This webservice is heavily used with reading/writing hapenning multiple times in a millisecond.

    I have already considered using Dictionary, but want to explore if there could be something faster.

    Also please suggest a locking mechanism such that Read and Writes are exclusive, while multiple reads are allowed and Read/Write are prioritised on FCFS. I am planning on ReadWriterLock, but its performance is doubtful and it prioritses Reads over Writes. Monitor is mutually exclusive, so some requests which are supposed to take few miliseconds may wait up longer.

    Thanks in advance.

    Friday, May 7, 2010 11:33 AM

All replies

  • Greetings,

     

    You want to use the Hashtable collection in the best way, for either an older program written in the C# language or for maintaining a program. The Hashtable provides a fast and simple interface, in many ways simpler than Dictionary. Here we see how you can use the Hashtable collection in the C# programming language, providing a fast lookup collection for hashing keys to values in a constant-time data structure.

    How The Hashtable Works

    When an element is added to the Hashtable, the element is placed into a bucket based on the hash code of the key. Subsequent lookups of the key use the hash code of the key to search in only one particular bucket, thus substantially reducing the number of key comparisons required to find an element. The load factor of a Hashtable determines the maximum ratio of elements to buckets. Smaller load factors cause faster average lookup times at the cost of increased memory consumption. The default load factor of 1.0 generally provides the best balance between speed and size.

    A different load factor can also be specified when the Hashtable is created. As elements are added to a Hashtable, the actual load factor of the Hashtable increases. When the actual load factor reaches the specified load factor, the number of buckets in the Hashtable is automatically increased to the smallest prime number that is larger than twice the current number of Hashtable buckets.

    Each key object in the Hashtable must provide its own hash function, which can be accessed by calling GetHash. However, any object implementing IHashCodeProvider can be passed to a Hashtable constructor, and that hash function is used for all objects in the table. The capacity of a Hashtable is the number of elements the Hashtable can hold. As elements are added to a Hashtable, the capacity is automatically increased as required through reallocation. The foreach statement of the C# language requires the type of each element in the collection.

    foreach(DictionaryEntry de in myHashtable)
    {
        // ...
    }


    Since each element of the Hashtable is a key/value pair, the element type is not the type of the key or the type of the value. Instead, the element type is DictionaryEntry. The foreach statement is a wrapper around the enumerator, which only allows reading from, not writing to, the collection. Because serializing and deserializing an enumerator for a Hashtable can cause the elements to become reordered, it is not possible to continue enumeration without calling the Reset method. Because keys can be inherited and their behavior changed, their absolute uniqueness cannot be guaranteed by comparisons using the Equals method.


    Thread Safety

    Hashtable is thread safe for use by multiple reader threads and a single writing thread. It is thread safe for multi-thread use when only one of the threads perform write (update) operations, which allows for lock-free reads provided that the writers are serialized to the Hashtable. To support multiple writers all operations on the Hashtable must be done through the wrapper returned by the Synchronized method, provided that there are no threads reading the Hashtable object.

    Enumerating through a collection is intrinsically not a thread safe procedure. Even when a collection is synchronized, other threads can still modify the collection, which causes the enumerator to throw an exception. To guarantee thread safety during enumeration, you can either lock the collection during the entire enumeration or catch the exceptions resulting from changes made by other threads.

    Lock

    Thread safety and syncronisation is an important part of any project, however using the lock keyword in C# can get you into trouble if its used incorrectly by causing inadvertant deadlocks of your code.

    Link is given below - Locking

    Thread Synchronization (C# Programming Guide)
    http://msdn.microsoft.com/en-us/library/ms173179%28VS.80%29.aspx


    lock Statement

    http://msdn.microsoft.com/en-us/library/c5kehkcz%28VS.71%29.aspx

    Take Care

    PL

     


    Helping People To Solve Technical Problems
    Friday, May 7, 2010 4:44 PM
  • Thanks for the detailed insight on Hashtable. Could you share your thoughts about Dictionary as well, I have read at lot of places that Dictionary has pretty much replaced usage of Hashtable because of its advantages, primarily of avoiding type casting. A few queries: 1.) Hashtable does provide a simple interface, but my priority of searching is equally important as enumerating and inserting. Is there any way both enumerating and inserting could be made performant? Currently this operation has the highest cost. 2.) Are there any disadvantages of using Dictionary over Hashtable? Does Dictionary have any constraints on size? My hashtable grows upto 500MB and expected to grow heavier in future.
    Tuesday, May 11, 2010 6:29 AM
  • Any limits on dictionary aren't going to be substantially different from hashtable - dictionary is based on hashtable.

    Moving from 1.1 to later versions of .Net and dictionary allows you to use generics and remove the overhead of boxing/unboxing.  This will be significant.

    When you're talking about enumerating, the pattern of access matters. You might find sorteddictionary offers some benefits.  A dictionary is stored in no particular order whilst a sorteddictionary is stored in order of the key.

    .........

    Andy O'Neill - helping people with technical problems without relying on cutting and pasting from MSDN.

    Tuesday, May 11, 2010 10:10 AM