none
Trimming table rows RRS feed

  • Question

  •  

    I have a table that is being filled with records from a stream coming from an Internet server and it is also being displayed in a GUI (a Plot) at the same time. The table is being filled very rapidly and I do not need all the rows in it. After a few miniutes I can weed our 4 rows out of five and it is enough for me. I am having trouble doing it. The GUI is a demo (NPlot) which I have adjusted to my needs, I do not want to mess with the source code too much.

     

    What happens is that it breaks down apparently at the time when the graph is being refreshed. It is refreshed at almost every new row coming into the table. The message I get is that "the row was removed from the table and does not have any data."

     

    I want to lock the table while I am reducing the number of rows. I do not know how to do it.

     

    This is how I do things now.

     

    When the number of rows exceeds a certain threshold, every 5th is copied into an exiliary table. Then the first (primary, working table) is cleared, then I copy all rows from the axiliary table to the primary one.

     

    When at the next round my rows.Count again exceeds the same threshold plus the number of rows that was left after the first trimming, I copy again all TRIMMED rows first to the auxiliary (workbench) table and then every 5th row from the rest. Then I clear the primary table and copy the whole of auxiliary to the primary one.

     

    If in the meantime a heavy download comes in, those records are put into the primary table beacuse I counted on speed of the process I described above to complete the turnaround before the primary table is needed again. Apparently it is not the case. The whole thing works for an hour, sometimes less, it depends.

     

    I need a permanent solution.

     

    If you want me to post any code, I will.

     

    I do not knoow how to lock just a portion of my code.

     

    Thanks.

     

     

    Monday, September 17, 2007 3:26 PM

Answers

All replies

  • Hi mate,

     

    What do you mean lock a portion of your code, do you mean only having a single thread run the code, like a critical section that cannot be interrupted?

    Monday, September 17, 2007 6:40 PM
  •  Derek Smyth wrote:

    Hi mate,

     

    What do you mean lock a portion of your code, do you mean only having a single thread run the code, like a critical section that cannot be interrupted?

     

    It is already threaded. I mean it is in a separate thread. What else can I do to accomplish what I think I clearly described?If it is not clear, please ask more questions. But be specific, not in generalalities.

     

    I think perhaps even a few threads are involved. First the Table is set up. It is a part of a dataset which is static. There are a number of similar tables in there but I am currently focucing on this one only.

     

    Then there is a thead that handles the connection to the server, socket and the stream. It is clearly a separate entity.

     

    Then there is a TabPage that is created on demand after the form is brought up and loaded. Perhaps 10 min after that or more, it is irrelevant. It has this GUI on it: NPlot candlePlot, etc. It is in a separate thread, naturally.

     

    In between all those threads the things get messed up because all actions are asynch and the overlaps are not secured. I do not have immediate experience to comprehend it in depth and solve it. I see the painful result of my ignorance in the graph that freezes and app throwing an exception after some spectacular performace for not long enough period. So I am asking for help.

     

    Thanks.

    Monday, September 17, 2007 6:54 PM
  • Think want your looking for is the SyncLock (VB) or lock (C#) statement which are used to lock thread access to a variable. I'd say you'd might want to lock down the table so that only one thread can update it, dataset objects aren't designed for multi-threaded access.

     

    You can also use the static Thread.BeginCriticalSection and Thread.EndCriticalSection methods to block off a section of code that once a thread enters it will complete without being interrupted.

    Tuesday, September 18, 2007 8:19 AM
  •  Derek Smyth wrote:

    Think want your looking for is the SyncLock (VB) or lock (C#) statement which are used to lock thread access to a variable. I'd say you'd might want to lock down the table so that only one thread can update it, dataset objects aren't designed for multi-threaded access.

     

    You can also use the static Thread.BeginCriticalSection and Thread.EndCriticalSection methods to block off a section of code that once a thread enters it will complete without being interrupted.

     

    Thank you Derek,

     

    It is a springboard. I will look into all this. The only thing I will ask is to perhaps give me a brief sample as to how locking for a table during any operation can be accomplished. What is a standard way to do it. All samples I've seen are concerned with lock (this) which will kill the whole idea.

     

    For instance, if I Add a row to a Table, would it work:

     

    lock (DataSet.TablesIdea)

    {

        DataSet.TablesIdea.Rows.Add ( newRow );

    }

     

    Does it make sense?

     

    Thanks.

    Tuesday, September 18, 2007 1:54 PM
  • Yes, just lock on the object you want to synchronize on, for example the DataTable inside the DataSet.

     

    The key thing is you need to now lock all write operations on this DataTable.  So everywhere you add or delete or modify rows must first lock the DataTable.

     

     

    Tuesday, September 18, 2007 5:09 PM
  • Hi Alex, yes as Matt has posted your code is the correct way to lock an item. This for example will lock the table so that only, for the duration of the lock, only one thread can read/write to it. The lock(this) approach is for locking shared variables in shared classes.

     

    There is another option, one more complex but slightly more flexible using a ReaderWriterLock class, this allows multiple reads but a single write to occur on an object, like your table. I am not 100% sure whether data tables can allow multiple concurrent reads but you'd think they would, but regardless this approach would need to tried out where as I'm fairly confident that the lock approach will work due to it being more limiting.

     

    With the ReaderWriterLock class you increment a counter on read access, and while the read counter != 0 no write locks can occur, if a write access lock is requested and read counter > 0 then any future read requests are queued, eventually the read counter reaches 0 and the write lock is allowed.

     

    It's the ReaderWriterLock class, try it out as it might provide a bit more flexibilty that the full lock, although I cannot say for sure whether it will work or not, just haven't use it the way you are looking to. Should have posted about this in the original answer but forgot it was there until now, sorry about that.

     

    Tuesday, September 18, 2007 5:41 PM
  • In general it sounds like Alex is confused about how to start with threading.  If this is the case let me recommend that you following a basic design that works pretty well until you understand threading a bit better.

     

    The basic design is a worker thread that reads work from a queue and then sends an event back to UI thread.

     

    So worker thread spins in a loop and reads items of work from a queue.  The queue can be something as simple as an ArrayList of work items.  Usually what I do is create a little class to manage a work item and the UI thread will fill out the class and then post it to the queue. 

     

    Sticking items in the queue and removing items must be thread safe, so use a lock around the queue (ArrayList for example).

     

    The worker thread spins and checks for items in the queue.  If you want to be more efficient you make the worker thread wait on an event (I ususally use a ManualResetEvent).  Then the code that queues a work item sets the ManualResetEvent and this wakes up the worker thread.  If this sounds too complicated you can make the worker thread do a small Sleep in the loop, but the event is much better for performance.

     

    So UI posts a work item, sets the event to wake up the worker thread.  Worker thread does the work then posts back to UI when complete using delegation to UI thread.  On the UI thread you have an event handler that picks up the completed work and updates the display, etc...  You can post to UI thread using this type of code:

     

      // Delegate for ProcessWorkerMessage.
      private delegate void delegateProcessWorkerMessage( int MessageType, Object message );
      delegateProcessWorkerMessage delegateProcessWorkerMessageImpl = null;
      private void ProcessWorkerMessageImpl( int MessageType, Object message )
      {
       // Process messages here...
       switch ( MessageType )
       { 
        case FormMessage.MSG_APPLY_FILTER:
         ApplyCustomFilter( (string) message );
         break;
        case FormMessage.MSG_RESET_FILTER:
         ApplyCustomFilter( "" );
         break;
        case FormMessage.MSG_SET_STATUS:
         SetStatus( (string) message );
         break;
        case FormMessage.MSG_SET_STATUS_ERROR:
         SetStatusError( (string) message );
         break;
        ...
       }
      }
      public void ProcessWorkerMessage( int MessageType, Object message )
      {
       // Process messages here...
       Object [] objParams = { MessageType, message };
       this.BeginInvoke( delegateProcessWorkerMessageImpl, objParams );
      }

     

    This is one way to do it, just a suggestion.  It is very reliable and works well. 

     

     

     

    Tuesday, September 18, 2007 5:55 PM
  •  Matt Neerincx wrote:

    In general it sounds like Alex is confused about how to start with threading

     

    No question about it.

     

    I greatly appeciate both Derek's and Matt's suggestions because I am sure one of them will work. Most likely I will try both..

     

    In the meantime I tried a solution that appealed to me by its simplicity. I found a crucial block of code and enclosed it in a try/catch block. It allows that block to lose some of the records perhaps but I thought it was a good compromise for a while. I celebrated the success since it allowed the app to work for the longest time so far (close to 5 hours). Finally it broke down.

     

    I will work on your codes tomorrow. Many thanks.

     

    Tuesday, September 18, 2007 6:15 PM
  •  Derek Smyth wrote:

    Hi Alex, yes as Matt has posted your code is the correct way to lock an item. This for example will lock the table so that only, for the duration of the lock, only one thread can read/write to it. The lock(this) approach is for locking shared variables in shared classes.

     

    There is another option, one more complex but slightly more flexible using a ReaderWriterLock class, this allows multiple reads but a single write to occur on an object, like your table. I am not 100% sure whether data tables can allow multiple concurrent reads

     

     

    Derek hi,

     

    I have begun looking into ReaderWriterLock class. It sound very attractive. I am almost certain it will work on Tables. In the example they give the resource is just an integer int.

     

    OK, my setup is as follows: I have a static DataSet with multiple tables attached to it. Some of them are created temporarily and then discarded, others are for the duration of the app run. This particular table is permanent. Also if I debug it successfully I will be also locking this way some other tables but in total a small fraction of all tables in the DataSet.

     

    It looks I have to create the table I want to protect in this class? In other words the routine that creates it is supposed to be located in this class?

     

    I do not want every other table in DataSet to be under any protection because of this MSDN note:

     

    Note   Holding reader locks or writer locks for long periods will starve other threads. For best performance, consider restructuring your application to minimize the duration of writes.

     

    If you understand my dilemma and have any ideas what I should do, please comment. I want to try to comprehend all the pitfalls before doing something headlong. It may be a lot of coding in my situation. The overall code is very sprawling.

     

    Incidentally, my simple solution with try/catch blocks seemed to work yesterday. What happened was that the app did not crash after 5 hours as I thought. The GUI simply could not refresh pictures anymore after a certain number of records was over. This morning I took care of it and set out for another run to try to go all the way through.

     

    Nonetheless I will try your ReaderWriterLock class because I feel it is safer. Also there is so much row resuffling going on everywhere that it is better to be safe than sorry. It is not the GUI thing alone.

     

    Thanks.

     

    Wednesday, September 19, 2007 1:54 PM
  • Hi AlexBB,

     

    Refreshed myself with the ReaderWriterLock class and I remember now, it doesn't lock anything at all, it is the lock. Which means it can be used to lock anything. What do I mean it is the lock. Think of it as two counters, a read counter and a write counter. When a read lock is acquired the read counter is incremented and the thread is allowed to read, if a write lock is acquired and the read counter > 0 then the thread is queued. As each read lock is released the read counter is reduced and when = 0 then the queued thread that wants to write is ran and the write counter is incremented to 1.

     

    Here are some comments I wrote when I first learned about the ReaderWriterLock, it might give you an idea of what happens.

     

    'it's like this....

    ' thread 1, says to ReaderWriterLock, hi I want to read

    ' ReaderWriterLock checks to see if a thread is writing, and there is,

    ' so ReaderWriterLock replies, hold on someone is writing at the moment

    ' the thread thats current writing calls ReleaseWriterLock or DowngradeToReaderLock,

    ' basically telling the ReaderWriterLock that it's finished writing

    ' ReaderWriterLock now knows there are no write locks so it says to thread 1,

    ' on you go and read the data

    ' thread 2 turns up and says to ReaderWriterLock, using AcquireReadLock, hi I want to read

    ' ReaderWriterLock checks to see if a thread has a write lock, and there isn't, but thread 1 is still reading

    ' this is ok so ReaderWriterLock replies to thread 2, aye on you go pal knock yourself out

    ' thread 1 then requests write access by calling UpgradeToWriterLock, ReaderWriterLock says to thread 1,

    ' sorry bud someone else is reading so get to the end of the queue.

    ' thread 3 turns up and says to ReaderWriterLock using AcquireReadLock, hi I want to read

    ' ReaderWriterLock checks to see if a thread has WriteLock, and there isn't, but thread 2 is still reading

    ' and thread 1 is waiting to write, so the ReaderWriterLock says to thread 3, aye ok but you need to wait until

    ' thread 1 has written it's data and it's waiting for thread 2 to finish reading.

    ' thread 2 thats current reading calls ReleaseReaderLock,

    ' and basically tells the ReaderWriterLock that it's finished reading

    ' ReaderWriterLock then says to thread 1, right on you go pal and write your data

    ' thread 1 thats current writing then calls ReleaseWriterLock, and tells the ReaderWriterLock right I'm aff

    ' ReaderWriterLock then says to thread 3 who's waiting to read, right on you go pal read away

     

    So with the ReaderWriterLock being the lock itself you can use it to lock anything, a single table, a single row, anything, as long as the multiple reads are possible which should be most objects but still worth thinking about it. So you will have no problem using it for just one table, in fact I've created an example but I'll get to that.

     

    With the performance issue you will need to think about that, for example, if you want to write 100 new rows to the table well it wouldn't be worthwhile acquiring a write lock while creating the new rows, only when the new rows where being added to the table. General rule of thumb is acquire the lock late and release it early. Another point, and I don't know how you've implemented your simple solution, but raising exceptions is very expensive in terms of performance, so you might gain some performance by not relying on exceptions but still not that much performance depending on how much write operations you do.

     

    Whether using this solution is correct for your project, I don't know, but I think you should take a copy of the project and give it a shot, I believe it's a much better solution to the exception handling your currently doing, but it's good to know that if this doesn't work then you still have a project there that does work. I'd give it a go, convince the money that it's a good step forward, and try it.

     

    Ok, here is the example, this console application is losely based on your program where there is a static dataset and only one table is protected (in fact there is only one table). Points to note, all threads that are controlled access to the table all must share the same ReaderWriterLock. The example aquires the lock before it creates a new row for the table, might not be the best approach, ok in this example as only one row is being created.

     

    There are 10 threads in this example and each one runs the same method. The method loops 25 times and on each iteration there is a 75% chance the thread will read a row and 25% chance the thead will write a new row. The programs output is displayed in the Console and to a File Output.txt so you can look at what is going on, notice that a write will only occur when all reads have been finished. This should give you an idea is the concept will work in your project.

     

    using System;

    using System.Collections.Generic;

    using System.Text;

    using System.Diagnostics;

    using System.Threading;

    using System.IO;

    namespace ReaderWriterLocksWithDataSetTables

    {

    class Program

    {

    //reader writer is shared by all threads that access the table

    // you don't associate the lock on the table

    // the reader writer is like a disconnected lock in that its disconnected from the item it locks

    // think of it as a flag that indicates when it's ok to read the table

    static ReaderWriterLock tableLock = new ReaderWriterLock();

    static SampleData dataset = new SampleData();

    static void Main(string[] args)

    {

    if (File.Exists("Output.txt"))

    {

    File.Delete("Output.txt");

    }

    Trace.Listeners.Add(new ConsoleTraceListener());

    Trace.Listeners.Add(new TextWriterTraceListener("Output.txt", "TextOutput"));

    dataset.ReadXml("SampleData.xml");

    Thread[] threads = new Thread[10];

    for (int i = 0; i < 10; i++)

    {

    threadsIdea = new Thread(SimulateLoad);

    threadsIdea.Name = i.ToString();

    threadsIdea.Start();

    }

    Trace.Flush();

    Console.ReadLine();

    Trace.Close();

    }

     

    public static void SimulateLoad()

    {

    for (int i = 0; i < 25; i++)

    {

    //determine if a read is needed

    int rndRead = new Random().Next(100);

    //75% of the time a thread will require a read

    if (rndRead <= 75)

    {

    //obtain a read lock

    tableLock.AcquireReaderLock(1000); //thread is reading from the table

    Trace.WriteLine(string.Format("READ LOCK: thread {0} obtained read lock", Thread.CurrentThread.Name));

    //read a record, do some work with read information, release read lock

    int record = new Random().Next(dataset.SampleTable.Rows.Count - 1);

    int id = dataset.SampleTable[record].ID;

    int value = dataset.SampleTable[record].Value;

    Trace.WriteLine(string.Format("READ: value {0} has been read from record with ID {1} from thread {2}", value, id, Thread.CurrentThread.Name));

    Thread.Sleep(10); //simulate some work

    tableLock.ReleaseReaderLock();

    Trace.WriteLine(string.Format("READ LOCK: {0} released read lock", Thread.CurrentThread.Name));

    }

    //think for a bit

    Thread.Sleep(1000);

     

    int rndWrite = new Random().Next(100);

    //25% of the time a thread will require a read

    if (rndWrite <= 25)

    {

    //obtain a read lock

    tableLock.AcquireWriterLock(1000); //thread needs to write to the table

    Trace.WriteLine(string.Format("WRITE LOCK: {0} obtained write lock", Thread.CurrentThread.Name));

    //write a record, release write lock

    SampleData.SampleTableRow row = dataset.SampleTable.NewSampleTableRow();

    row.Value = (short)new Random().Next();

    dataset.SampleTable.AddSampleTableRow(row);

    Trace.WriteLine(string.Format("WRITE: value {0} has been written to record with ID {1} from thread {2}", row.Value, row.ID, Thread.CurrentThread.Name));

    Thread.Sleep(10); //simulate some other work

    tableLock.ReleaseWriterLock();

    Trace.WriteLine(string.Format("WRITE LOCK: {0} released write lock", Thread.CurrentThread.Name));

    }

    }

    }

    }

    }

     

     

     

     

    Wednesday, September 19, 2007 5:06 PM
  • Alex

     

    You can download the sample here

    ReaderWriterLock.zip

     

    Wednesday, September 19, 2007 7:53 PM
  •  Derek Smyth wrote:

    Alex

     

    You can download the sample here

    ReaderWriterLock.zip

     

     

    I just want to express my appreciation. I will have to read it all and it will take time. Thank you.

    Thursday, September 20, 2007 2:58 PM
  • Derek hi,

     

    I am not sure if you picked up the fact tha my situation goes beyond just read and write. I also clear the table and the new rows are added all the time. Does this class cover this? I do not have static table with a fixed number of rows.

    Thursday, September 20, 2007 6:23 PM
  • Hey Alex,

     

    Reader/Writer Locks and the ResourceLock Library

    http://msdn.microsoft.com/msdnmag/issues/06/06/ConcurrentAffairs/

     

    http://msdn.microsoft.com/msdnmag/issues/06/03/ConcurrentAffairs/

    http://msdn.microsoft.com/msdnmag/issues/05/10/ConcurrentAffairs/

     

     

    Again done some probing and found the above, and from what I read perhaps the Reader/Writer isn't the best solution after all. The top article recommends that the ReaderWriterLock should only be used where there are a lot of reads and little writes, which is not your situation at all. Thanks for doubting my suggestion because I never knew that, thought that lots of little short writes would have been ok, but apparently not.

     

    Jeffrey Richter, the chap that wrote that article, is from http://www.wintellect.com/ and he's created a library of threading locks, PowerThreading, that revolve around better performance. You can download and look into but I have no idea what they are or how they work, but I think these are your solution.

    Thursday, September 20, 2007 9:15 PM
  • I worked with a customer several years ago that was using DataSet/DataTable as a in memory cache for some ASP.NET code.  What the customer wanted to do was have high performance reading of data, with some infrequent writes.

     

    I came up with an innovative scheme that I called "copy on write" (well I didn't actually invent the idea, it was invented by IBM database gurus back in the 50s, but I leveraged it with DataSet).

     

    The idea is fairly simple.  All readers just grab the DataSet and read without any lock.  The writers have to grab a lock, then make a full copy of the DataSet, modify the DataSet, then assign it back to the global location where the readers get it, then release the lock.  Since you are assigning a new object to the global DataSet variable, this is thread safe operation in CLR (assignment of object).

     

    Note it is critically important that the writer make a complete copy of the DataSet for this to work.  The reason is the old DataSet is still in use by readers on various threads until the writer pops the new one in place.  This unblocks the readers completely, they never have to get a lock.

     

    You can optimize this a little bit by taking an bit more optimistic approach (but to be honest it is usually not worth it).  The writer grabs the lock, makes a copy, releases the lock, then makes modifications (outside the lock), then writes over the DataSet.  The only problem with this approach is if two writers both grab copy of DataSet at the same time the last one to write wins (and first writes are lost).  You can make this work more reliably with a bit more complexity (writer does interlocked increment of counter when copying the dataset and grabs lock and checks counter when assigning back, if counter has not changed, then it is safe to write, if counter is changed, you have to dump everything, sleep a random amount, and then start over). 

     

    Another thing to note (this usually kills people when they hear it), creating a DataView on a DataTable is a write operation on the DataSet.  This is due to some perf optimization in DataSet where it caches indexes created for DataViews (for reuse).  This is what got my customer in a bunch in the first place.
    Friday, September 21, 2007 4:46 AM
  • Thats actually a rather interesting solution, never knew assigning an object was thread safe but thinking on it a bit I can imagine it would be. I like that approach and, if it's ok I'd like to post it in my blog, and credit where it is due your name will be mentioned.

     

    Thanks for sharing that Matt, I've learned something today.

     

    Friday, September 21, 2007 6:13 PM
  • Blog away.  Note that with DataSet (my mind is a bit fuzzy) there was a Clone and that did not work properly for some reason.  I recall we had to use some other method of making a copy.  This might be fixed in 2.0.  So if you do write code, keep this in mind and verify it.

    Friday, September 21, 2007 9:39 PM