none
WinCE 6.0 Performance Issue: User Interface: Thread Starvation: Mouse Event/Touch Event causes CPU intensive system threads to run RRS feed

  • Question

  • Hi, I am developing a real-time patient monitoring device using i.Mx27-400MHz processor. The user interface is being designed using Expression Blend 2, and OS is Windows CE 6.0 R3. The system runs time-critical SPI thread that should run every 1ms and acquire ADC data from 8 channels. There are corresponding algorithms' threads in application for each channel that should run every 1ms, 2ms, 2ms, 8ms etc. I'm facing critical performance issues during run-time. Further, in UI, we've a system time text block that reflects system time in the format: hh:mm:ss. When I continuosly hover mouse across the Silverlight UI layouts in a circular motion, I saw that the system time block stops updating the time. It updates intermittently say every 2 secs or 4 secs, etc. I used kernel tracker to find out the reason. I found that for every mouse event, the following threads execute in sequence, have mentioned their priorities as well:

    1. UsbInterruptThreadStub (101)

    2. InterruptThreadProc (249)

    3. MouseThreadProc (249)

    4. UserInputThread (249)

    5. Data Acquistion thread, SPI-ADC(250)

    6. Application thread to update time (251)

    The UserInputThread, for every mouse event, ran continuously for as large as 2.79 ms, thereby starving all other threads in the sytem. This means, for every 1000 mouse events I get around 3 seconds of delay before my application threads run. Due to this the time-critical Data Acquistion Threads and Application Algorithm threads also misses a chunk of data of about 3.0 ms per mouse movement. This is not acceptable for the product's application domain. I did the following changes to prevent this thread starvation:

    1. Increased the application thread priorities as large as 100 or 95 so that I could pre-empt the UserInputThread.

    2. Inspite of high priority of application threads, UserInputThread is unaffected. I wonder how priorities really work in WinCE

    3. I restarted the board and without running my application, hovered the mouse again over wince desktop image and captured the details in kernel tracker. The same cause was found out. The UserInputThread, a thread in GWES.lib, ran always above 2.2 ms for every mouse event. Hence the problem may not be with Silverlight GUIs

    4. I disabled SPI driver, alphablending driver, USB mouse driver. Still the same observation.

    5. I ran device emulator and hovered the mouse over it. Same problem in device emulator was observed with userinputthread.

    6. The same UserInputThread runs for touch panel event.

    7. Since a touch panel event or mouse event causes the time - critical SPI thread, algorithm threads to starve inspite of assigning high priorities to all my threads, I cannot confirm whether to use wince or not.

    8. Can anybody help me get rid of this performance issues with user interface component of GWES.lib in WinCE 6.0? I guess, its thread runs with something like 'CeSetThreadQuantum (0)' and hence does not relinquish until completion.

    9. Somewhat similar problem has been reported by a guy at the below link. He is still unanswered.

    http://us.generation-nt.com/answer/high-priority-application-threads-qt-mfc-gui-page-refresh-help-88133202.html

     

    Regards

    Friday, July 30, 2010 8:17 AM

Answers

  • Yes, it's preemptible.  From that, I'd guess that you are not ready-to-run.  Generally, calling Sleep() in anything that's supposed to be real-time is a bad idea.  I think that, if you look at the Sleep() docs you'll see that the thread is guaranteed to sleep for at least the time specified (0 = the rest of its thread quantum).  This does not make Sleep() a real-time call, however, as it's quite possible for the thread to be in the sleep state much longer than the parameter time. 

    We might be able to direct you with some more-detailed pseudo code for the real-time thread, but trying to do ms-resolution timing using system calls really just won't work (the system timer is usually 1ms, so that's the resolution of everything).  Using a separate timing pulse interrupt to trigger the start/end conversion processing is a better way to go.  When the interrupt fires, you have the processor in the ISR.  *Very* quickly do what you need to do, either start the conversion or get the result, drop it somewhere that can awaken a blocked high-priority thread (like a point-to-point message queue, see CreateMsgQueue()), and return from the interrupt handler.  The high-priority thread would *never* be in Sleep.  It would do a WaitForSingleObject (or multiple objects), and, when awakened, would quickly grab the data in the queue, ReadMsgQueue(), and process it for the UI.  I think that this would give you as much time as possible for the UI to do its job, while maintaining regular acquisition timing.

    Still, if the UI is chewing processor cycles to do fancy graphical operations, fade in, fade out, transparency, etc. when you move the mouse over some section of the UI, you may not have enough cycles in this processor to do what you want to do with the tools that you've chosen.  In that case, add performance, or switch to a more bare-bones UI run-time.  As I said, starting from basic Win32 will tell you if there are enough cycles to do anything while you're doing the real-time stuff in the background.  If that works, then you can upgrade.

    Paul T.

    • Marked as answer by Neil_AN Tuesday, August 3, 2010 7:49 AM
    Monday, August 2, 2010 3:12 PM

All replies

  • I don't know if we can fix it, but I can tell you a little about how priorities work.

    1. The highest priority thread that is *ready to run* always runs.  There is no attempt to prevent starvation *at all*.  This is necessary to have a real-time system.

    2. If two or more threads are at the same priority and are all ready to run, then run in round-robin fashion, after getting one thread quantum per cycle.  The thread quantum can be set on an individual thread basis by calling CeSetThreadQuantum().  The default thread quantum is set by the OS developer and the default is 100ms.  That is, if a thread does work continuously, when it is the active thread it will run for 100ms before another thread at the same priority will get its turn.

    3. Thread priorities are inherited during system calls, generally.  That is, if you have a thread running at a high priority and you make a system call, while that system call is running in your thread context, it has your priority.

    4. Some priority shifting can occur when the owner of a resource for which a high-priority thread is waiting is normally at a lower priority.  Its priority can be elevated to speed release of the resource.

    You're saying that your thread is "starving", but this other thread is only running for 2.79mS?  That does not qualify as starving, although it may not meet your timing requirements!  Using thread timing to perform tasks really isn't a suitable way to do things, paritcularly where accurate timing of external events is required.  If you need very accurate timing on your A/D conversions, you should use an external start-convert signal to trigger each conversion, have an interrupt generated on end-of-convert (or have the hardware store up several conversion results in a FIFO), and then get the data you need from the A/D at a slower rate via your current system and process it.  You have a limited number of cycles and I feel like you're wasting them trying to build an accurate clock on top of the scheduler.  If you just can't use an external start-convert signal, a timing pulse firing an interrupt at the desired rate and starting the conversion in the ISR for that interrupt is a second-best method.  No scheduling involved.

    Finally, what's the real problem here?  Your data acquisition thread isn't running right?  Or the UI isn't updating when you want?  I think that the UI updating is probably going as fast as it can and as fast as you want it to, given the real-time nature of the rest of your system (remember that updating a text box involves sending a message to the text box telling it of the new text, which causes it to update internally and tell Windows that it needs to be redrawn, resulting, eventually, in a WM_PAINT message, etc.)  That's the least-important of the operations being performed, it seems to me.  Running the data processing thread (or, if necessary, threads), to only process data, not to start conversions or wait for them to complete, etc., frees up cycles for other things like updating the UI. 

    Finally, I'm not sure that I'm confident that you have enough cycles to run a fancy UI, particularly something as fancy as Silverlight UIs usually are, with no display acceleration, while, at the same time, trying to do a bunch of data acquisition and processing.  A simple C-based application using the Win32 API to display something simple would be a better test bed for the data acquisition.  If that all works and is reliable, then you can plop your fancy UI on top and see if there's still enough cycles to make everything run fast enough.  If not, you can trade off the fancy UI or go with a faster processor or a graphics accelerator (maybe), to enable your requirements.

    Paul T.

     

    Friday, July 30, 2010 3:58 PM
  • Hi Paul, Thanks for your detailed reply.

    1. Speaking of actual problem:

    During performance analysis of the system, the performance monitor showed that CPU utilization was around 23%. This was the case with all the drivers running (including data acquistion) + all algorithms running (heart rate, uterine activities, etc) + Silverlight's GUI. This UI displays real-time heart rate data that is processed by algorithms. Test case: When mouse pointer was hovered over the UI, the CPU utilization jumped to 80-85% as long as mouse was being used. This slowed down the system considerably. At application layer, in UI, all I could see were the obvious things - 1. the time block did not update at a consistent rate 2. Worst of all, the heart rate did not update in GUI either at an acceptable rate. If I changed the heart rate through a hardware simulator from 50 BPM to 120 BPM, it was not immediately updated in GUI as it would without mouse movement. This is a critical failure.

    Hence I was worried whether this mouse movement had any adverse effect on the time-critical driver, namely, SPI driver, and application algorithms. Note: if touch panel was touched, there were surges to 80-85% momentarily (hence negligible). But I was curious to find if touch panel events do cause data to be missed.

    On analyzing with Kernel Tracker, I found that UserInputThread gets executed for every mouse movement/touch event. It consumes good amount of time (2.7ms).  Even if a thread quantum can be as large as 100ms, I should be able to pre-empt a cpu intensive thread using higher priorities in my threads. To make sure that my higher priority thread is ready to execute, I replaced Sleep (100) with Sleep (0) too, so that my thread will relinquish at the end of its execution; and be ready for the next execution. Yesterday, I saw that for a touch event, at one place UserInputThread took awful time of 12 ms! Now, I cannot confirm the maximum time this thread executes. Whatever it be, I should be able to pre-empt it.

    I'm keen to know why this thread has been designed as so in wince to take so much time and not get pre-empted?....or may be anything I'm doing wrong.....but the default device emulator in WinCE 6.0 shows this problem too!!!

    Regards

     

    Saturday, July 31, 2010 7:20 AM
  • I would like to inform that our ADC chip does not have any interrupt line to SPI. It is assumed to have raw data every 1 ms. The SPI is triggered by a timer every 1ms. The SPI reads raw values from ADC and passes to the application. To pass data from SPI to processor, I'm using spi's interrupt. The application has algorithms to decide whether values from SPI are valid or not. If valid, it calculates heart rate, else throws them. The product we're designing is an upgrade of its previous existing versions that have been running in the market for over 20 years. The previous versions had exactly same hardware setup for SPI-ADC (without interrupt line) and was using ThreadX.

    Regards,

    Neil

    Saturday, July 31, 2010 8:06 AM
  • Personally I would not use such a complex and sophisticated technology as Silverlight in a time and safety critical app. I very much doubt Silverlight is designed with such things in mind. As Paul said, start with something *much* simpler.
    Monday, August 2, 2010 6:44 AM
  • I agree, we use Altia which is extremely powerful at being able to extract the graphics performance we need.

    Regards,

    Ian

    Monday, August 2, 2010 11:09 AM
  • Yes, it's preemptible.  From that, I'd guess that you are not ready-to-run.  Generally, calling Sleep() in anything that's supposed to be real-time is a bad idea.  I think that, if you look at the Sleep() docs you'll see that the thread is guaranteed to sleep for at least the time specified (0 = the rest of its thread quantum).  This does not make Sleep() a real-time call, however, as it's quite possible for the thread to be in the sleep state much longer than the parameter time. 

    We might be able to direct you with some more-detailed pseudo code for the real-time thread, but trying to do ms-resolution timing using system calls really just won't work (the system timer is usually 1ms, so that's the resolution of everything).  Using a separate timing pulse interrupt to trigger the start/end conversion processing is a better way to go.  When the interrupt fires, you have the processor in the ISR.  *Very* quickly do what you need to do, either start the conversion or get the result, drop it somewhere that can awaken a blocked high-priority thread (like a point-to-point message queue, see CreateMsgQueue()), and return from the interrupt handler.  The high-priority thread would *never* be in Sleep.  It would do a WaitForSingleObject (or multiple objects), and, when awakened, would quickly grab the data in the queue, ReadMsgQueue(), and process it for the UI.  I think that this would give you as much time as possible for the UI to do its job, while maintaining regular acquisition timing.

    Still, if the UI is chewing processor cycles to do fancy graphical operations, fade in, fade out, transparency, etc. when you move the mouse over some section of the UI, you may not have enough cycles in this processor to do what you want to do with the tools that you've chosen.  In that case, add performance, or switch to a more bare-bones UI run-time.  As I said, starting from basic Win32 will tell you if there are enough cycles to do anything while you're doing the real-time stuff in the background.  If that works, then you can upgrade.

    Paul T.

    • Marked as answer by Neil_AN Tuesday, August 3, 2010 7:49 AM
    Monday, August 2, 2010 3:12 PM
  • Hi Paul,

    Now, I'm able to pre-empt the UserInputThread. I was not really sure in the application level whether the threads were in 'ready-to-run' state or not. Hence I changed the priority of SPI device driver to 100 (initially 250), since I was sure that SPI was not using any sleep anywhere, but only interrupts. The UserInputThread was getting triggered by higher priority thread, UsbInterruptThreadStub (Priority = 101). I ran the same test: fast movement of mouse pointer over wince windows and captured the details in kernel tracker. I saw that UserInputThread was getting pre-empted by spi driver whenever the system interrupt for GPT and SPI were triggered. Hence I'm sure that SPI is not losing data any more. If spi driver captures the real-time data without missing, the critical problem is solved.

    Now, the question remains as why do application threads are not able to pre-empt inspite of high priorities as large as 50. I guess, the application threads are not ready to run. To be sure, I disabled all threads in the application including GUI and kept only algorithm threads enabled in the application because I was sure that these threads were written by me and I had not used any sleep events. And it worked.

    Am grateful to all of you for helping me out. It's all because of 'not ready-to-run'. I'll let you know if Silverlight GUI and all other application threads are running successfully after optimization

    Regards,

    Neil

     

    Tuesday, August 3, 2010 7:49 AM