none
Interrupt API affinity hardcoded to only work on core 1 (Embedded Compact 7) RRS feed

  • Question

  • Hello,

    I was trying to investigate the issue causing random delay which happens in InterruptDone call.

    Sometimes it can take up to 15 millisecond (or even more) for system to call it.

    It was very difficult to verify what can cause such a delay.

    Problem appears to be influenced interrupt API, which can only be called on core 1.

    Kernel code explicitly sets affinity to core 1 when calling OEMInterruptDone, for instance.

    Here is my case (confirmed by kernel tracker data):

    I have 2 threads of the same priority.

    Thread 1 is doing a lot of hard work and consumes CPU time.

    Thread 2 is the driver thread which operates with interrupt API; actually it works with data provided by thread 1.

    In rare cases if thread 1 is scheduled to core 1 and thread 2 wants to call InterruptDone (for instance) it is delayed until thread 1 is either goes sleep or have thread's quantum expired.

    That causes random delays in thread 2 despite that there are 4 cores in the system and 2 of them are actually idling while threads are clashing over core 1.

    I understand that this issue can be solved by priority adjustments but this is still strange to me.

    Is there any particular reason why interrupts API is hardcoded to only work on core 1?

    Should not it be BSP vendor dependent taking into an account CPU differences?

    Ok, assuming that kernel is designed in such a way and this API can definitely work on core 1 only.

    In such case, why kernel is not force re-schedules occupant thread to a different (idling) core to make core 1 free to those who want to use such privileged API like interrupt?

    Thanks,

    Alexey

    Thursday, March 2, 2017 3:42 PM

Answers

  • Hi Alexey,

    You are asking what equates to a design decision.  Keeping interrupts on one core actually frees up potential interrupt bottlenecks.  This is not a "Microsoft" specific choice and you can find other sites that discuss this such as http://www.alexonlinux.com/why-interrupt-affinity-with-multiple-cores-is-not-such-a-good-thing.

    In my experience, an interrupt handler should do as little work as possible and hand off the effort.  I have seen poorly written interrupt handlers impose horrible delays that are easily mitigated by handing off the work to code outside the handler itself.

    UPDATE:  I reached out to a member of the original CE OS team that has since moved on to another role but their answer is "Because InterruptDone need to access hardware (interrupt controller) on the same core where interrupt occurred, which is always core0. "

    Sincerely,

    IoTGirl




    Thursday, March 2, 2017 5:10 PM
    Moderator