none
Minimum possible Jitter in WEC7 RRS feed

  • Question

  • What is the minimum possible latency jitter in WEC7?

    In one of the pdf I found online, it says that jitter is in the range of 100us (100 microseconds) for WinCE6.0.

    How about WEC7? Anybody measured?

    Monday, December 28, 2015 12:04 PM

All replies

  • 7 microseconds on ARM Cortex-A7 1GHz.
    Tuesday, December 29, 2015 6:32 AM
  • Thanks Stolyarov!
    Tuesday, December 29, 2015 6:54 AM
  • Hi GNKeshava,

      if you are asking about control loop jitter, that meaning depends on CPU/peripheral load on WEC7.
    For x86 (e.g. Atom 600-1500 MHz) it's 40-60 us for CPU load < 80-90%. Single core ARM-A8 shows up to 60-100us.
    But under heavy load it can exceed a few miliseconds (tested on TI AM3784, 800MHz) !!!
    Multicore CPUs have similar results and sometimes badly (tested on iMX6Q, Atom E3845).

     Our partner, whom produces industrial controllers lost over 1 year to investigate this issue,
    finally they had to cut some functionality of their new product to fit realtime requirements.
    They found that WEC7 core introduced some internal locks for SMP support, that makes worse OS performance.

     Meanwhile, CE6 hasn't such "feature" and works much better.

    Best regards, Igor

    Tuesday, December 29, 2015 7:12 AM
  • Thank you very much for the input, Igor!
    Tuesday, December 29, 2015 7:21 AM
  • When you say internal locks for multicore, are you referring to locks in the code that we have no access to, or is it a lock in the OAL or PRIVATE folder? If it was locks in code they did not have access to, how could they determine that was the issue causing the jitter? If they had access to the code, did they try modifying it?

    Monday, January 18, 2016 6:11 PM
  • At GuruCE we have done extensive testing as well. The bad real-time performance on multi-core ARM is partly due to the sledge-hammer way the WEC7/2013 kernel is doing cache maintenance. The Cortex-A9 allows for very fine-grained cache maintenance, but the CE kernel just blindly and badly cleans the entire cache, even when it is not necessary to do any operation. The problem is that on Cortex-A9, full cache operations by set/way are not hardware cache-coherent and thus need to be repeated on each and every core (so more cores means longer IST latency and jitter). Another issue that is causing grief is that CE does not have any way to just invalidate the cache (this is super fast). There's only "clean" and "clean and invalidate" (both much slower due to the clean causing writeback). For the iMX6 it gets worse: full cache operations on the L2 cache L2C-310 controller used in the iMX6 are slow, and on top of this there are some errata on the iMX6 that require you to disable the double line fill feature on L2 to prevent lock-ups and data-aborts, making performance even worse. Freescale fixed these errata on their iMX6 "Plus" variants, so they show better performance (but still nowhere near the 'hard real-time' definition of IST times below 100 us on multi-core with L2 enabled).

    L2 maintenance on multi-core enabled CE kernels is the main culprit when it comes to IST jitter because the L2 is external and shared over all the cores, and thus needs to be protected with a spinlock to prevent simultaneous access by different cores. Having more than a single core will lead to lock convoy situations when doing L2 maintenance (due to the spinlocks). ISR latency is not affected by L2 cache maintenance operations.

    We have optimized the kernel to the greatest extent possible and to further optimize WEC with SMP again for real-time performance with L2 enabled, Microsoft will have to make a design change to some of their code. Not all of the required code is available (there is enough to debug and find the exact issue, but not enough to successfully recompile that part of the kernel). We've tried to move Microsoft to make this design change, we even provided the solution, but they have shown no interest at all to fix this issue.

    Here's a rough table on what to expect on an iMX6 Quad running at 796 MHz (using the highly optimized GuruCE iMX6 BSP):

    

    All times in us of course.


    Good luck,

    Michel Verhagen, eMVP
    Check out my blog: http://guruce.com/blog

    GuruCE
    Microsoft Embedded Partner
    http://guruce.com
    Consultancy, training and development services.


    Monday, January 18, 2016 8:41 PM
    Moderator
  • Thank you for the comprehensive explanation, I suppose the issue is similar on the x86 platform. I would like to do some testing as well and post the results. I see you were able to enable and disable the level 2 cache for your arm boards. Do you know where I should start looking in order to enable/disable the level 2 cache for an x86 platform? I'm actually not even sure if I'm currently using the level 2 cache now...
    Monday, January 18, 2016 8:57 PM
  • No, the issue is not similar on x86 as far as I know. Full cache operations on x86 are just as fast as the granular ones because the x86 cores are optimized for full cache ops (but I'm no x86 expert).

    I'm not sure if it is even possible to disable L2 on x86...


    Good luck,

    Michel Verhagen, eMVP
    Check out my blog: http://guruce.com/blog

    GuruCE
    Microsoft Embedded Partner
    http://guruce.com
    Consultancy, training and development services.

    Monday, January 18, 2016 9:51 PM
    Moderator
  • Ok, good to know. I'll do some functional testing with multicore and post some results anyways. Maybe someone who has done the same on x86 multicore knows the source of the internal locks (if the performance degrades).

    Thanks!

    Monday, January 18, 2016 10:04 PM