none
SysIntr stops being triggered after a period of time RRS feed

  • Question

  • Hi,

    I am developing a driver for the AM335x processor in Windows Embedded Compact 2013 which uses an ISR triggered by a SysIntr.

    I'm using the following code to intialize the interrupt

    pDevice->device = AM_DEVICE_DCAN1;

    irq[0] = (DWORD) -1; irq[1] = OAL_INTR_STATIC; irq[2] = GetIrqByDevice(pDevice->device,NULL);

    pDevice->hIntrEvent = CreateEvent(NULL, FALSE, FALSE, NULL);

    KernelIoControl(IOCTL_HAL_REQUEST_SYSINTR,irq,sizeof(irq),&pDevice->dwSysintr,sizeof(pDevice->dwSysintr),NULL); InterruptInitialize(pDevice->dwSysintr,pDevice->hIntrEvent,NULL,0);

    The ISR contains the following function

     dwRet = WaitForSingleObject(pDevice->hIntrEvent, 10000);

    The interrupt works fine for a few hours, then all of a sudden it stops being fired.

    I know that the ISR is still running because the dwRet == WAIT_TIMEOUT condition is met.

    The device I am using is DCAN1 and I know that the hardware is still functioning because the system is still regularly transmitting data through the CAN bus.

    I have tried to write some code to re-initialize the SysIntr which executes when the WAIT_TIMEOUT condition is met:

    			ResetEvent(pDevice->hIntrEvent);
    
    			InterruptDone(pDevice->dwSysintr);
    			InterruptDisable(pDevice->dwSysintr);
    
    			if (KernelIoControl(IOCTL_HAL_RELEASE_SYSINTR, &pDevice->dwSysintr, sizeof(pDevice->dwSysintr), NULL, 0, NULL)){
    				Sleep(1000);
    
    				DWORD irq[3];
    				irq[0] = (DWORD)-1;
    				irq[1] = OAL_INTR_STATIC;
    				irq[2] = GetIrqByDevice(pDevice->device, NULL);
    				if ((irq[2] == (DWORD)-1))
    				{
    					RETAILMSG(ZONE_ERROR, (L"ERROR: CAN_ISR: unable to get the interrupts %u, Device %u\r\n", irq[2], pDevice->device));
    				}
    
    				// Request a SYSINTR for DCAN
    				if (KernelIoControl(IOCTL_HAL_REQUEST_SYSINTR, irq, sizeof(irq), &pDevice->dwSysintr, sizeof(pDevice->dwSysintr), NULL) == FALSE)
    				{
    					RETAILMSG(ZONE_ERROR, (L"ERROR: CAN_ISR: unable to request a SYSINTR\r\n"));
    				}
    
    				if (pDevice->hIntrEvent)
    				{
    					CloseHandle(pDevice->hIntrEvent);
    				}
    
    				// Initialize Interrupts
    				if (InterruptInitialize(pDevice->dwSysintr, pDevice->hIntrEvent, NULL, 0) == FALSE)
    				{
    					RETAILMSG(ZONE_ERROR, (L"ERROR: CAN_ISR: unable to initialize the interrupt\r\n"));
    				}
    			}
    			else
    			{
    				RETAILMSG(ZONE_ERROR, (L"ERROR: CAN_ISR: unable to release SYSINTR\r\n"));
    			}
            }

    This code fails upon calling the InterruptInitialize() function.

    I'm doing some stress testing so there is a lot of data throughput (every few ms I send a frame) but this shouldn't mean that the interrupt stops firing.

    Has anyone had a similar issue?

    Is anyone here familiar with using the sysintr for DCAN1 on the AM335x processor?

    Can anyone see what I'm doing wrong when I try to reinitialize the interrupts?

    Thanks.




    • Edited by tomleijen Thursday, November 27, 2014 4:57 AM
    Thursday, November 27, 2014 4:52 AM

Answers

All replies

  • By adding the line:

    pDevice->hIntrEvent = CreateEvent(NULL, FALSE, FALSE, NULL);

    I can get the InterruptInitialize() function to work and it successfully recovers the DCAN1 interrupts - although the occurrence of the timeout happens more frequently once it has fallen over the first time.

    This to me indicates that it is not necessarily a hardware issue, the system interrupt is probably still working fine but more something in the kernel where the SysIntr is linked to the Event Handle, or how it is set up to start with.

    Are there any guidelines or special rules around using system interrupts and event handles that I am not following here?

    Thursday, November 27, 2014 11:16 PM
  • Yes, I've seen the same on iMX53 with WEC2013 (problem does not occur on WEC7).

    I found that calling InterruptDone once more fixes the issue. This seems to be a problem in the WEC2013 kernel where the interrupt does not get unmasked, but so far I haven't been able to reproduce in our iMX6 BSP and I haven't done enough deep investigation to say this is a kernel issue for sure.

    Funny enough I also saw the issue first when doing CAN stress testing on iMX53/WEC2013, but have also been able to reproduce later with a simple GPT interrupt. Same tests runs fine on WEC7 (exactly the same BSP and test code, except some assembly code diffs in the BSP between WEC7/8).


    Good luck,

    Michel Verhagen, eMVP
    Check out my blog: http://guruce.com/blog

    GuruCE
    Microsoft Embedded Partner
    http://guruce.com
    Consultancy, training and development services.





    Monday, December 1, 2014 2:30 AM
    Moderator
  • Hi Michael,

    Interesting that you found this testing CAN as well.

    Re-initializing works, but this solution sounds like it would make it even better.

    Thanks for the advice, I will let you know if I find any more information.

    Monday, December 1, 2014 5:17 AM
  • Don't take it as a solution!

    If this is really an issue in the kernel then this is major issue. You need to get to the bottom of this because it could have earth shattering consequences if you do not (depending on your application ;-))


    Good luck,

    Michel Verhagen, eMVP
    Check out my blog: http://guruce.com/blog

    GuruCE
    Microsoft Embedded Partner
    http://guruce.com
    Consultancy, training and development services.

    Monday, December 1, 2014 7:37 AM
    Moderator
  • I sure wont - but it's something that is difficult to reproduce and could take a long time to figure out so it is good to have a patch so that testing can continue while we work out the correct solution.

    Hence why I didn't press 'Mark as answer' for your previous post.

    Monday, December 1, 2014 10:17 PM
  • This sounds very much like a problem that we had several years ago with a BSP that we maintained for a very different processor.

    I would review the BSP code and carefully analyze every line with the following in mind:

    1. The OS is multi threaded, so assume that every line has the potential for a read/modify/write error
    2. The hardware (source of the interrupt) could fire an interrupt after the driver handles all interrupts and before InterruptDone() unmasks the interrupt.

    Bruce Eitman (eMVP) Senior Engineer Bruce.Eitman AT Eurotech DOT com My BLOG http://geekswithblogs.net/bruceeitman Eurotech Inc. www.Eurotech.com

    Tuesday, December 2, 2014 2:11 PM
    Moderator
    • Edited by Bill at Brady Tuesday, March 31, 2015 5:31 PM added link
    • Proposed as answer by Bill at Brady Tuesday, March 31, 2015 5:34 PM
    • Marked as answer by tomleijen Monday, April 13, 2015 2:38 AM
    Tuesday, March 31, 2015 5:31 PM
  • Hi Bill,

    Is there any chance that we can get hold of those LIB files so that we can do some testing of our own?

    Thanks.

    Thursday, April 16, 2015 3:45 AM
  • Thursday, April 16, 2015 1:53 PM