none
Windows Embedded Compact 2013 localhost connection drops RRS feed

  • Question

  • We are developing a WEC2013 platform that runs two applications - one managed and one native that communicate to each other (and an external process) via TCP.

    It seems that after a certain amount of time the connection to localhost drops and the applications are unable to reconnect until the controller is re-powered.

    Unfortunately this problem is intermittent and I have only heard it from testers and seen it in logs so I haven't got too much detail on it yet but it would be great to see if anyone has experienced this problem and potentially know a solution.

    Wednesday, December 10, 2014 4:27 AM

Answers

  • After applying the latest QFE - the one that was released to fix the problems with interrupts not being serviced, this issue has been resolved.
    • Marked as answer by tomleijen Sunday, May 3, 2015 8:22 PM
    Sunday, May 3, 2015 8:22 PM

All replies

  • How did you see that the applications was unable to reconnect? This would require a possibility to monitor the behavior of the processes with a kind of log what you don't have.
    Monday, December 15, 2014 9:14 AM
  • I have got a log somewhere, but it's just the log produced by the native c++ application that just repeats 'could not connect to 127.0.0.1' over and over. It has so far only been picked up in a release build of the OS so I haven't got any platform builder debug output or more detailed logs.
    Monday, December 15, 2014 6:55 PM
  • I have some more details:

    After calling:

    _commsClient = new TcpClient(AddressFamily.InterNetwork);
    
    _commsClient.NoDelay = true;
    
    _commsClient.Connect(“localhost”,
    124);

    In the C# code the program throws these exceptions:

    A first chance exception of type 'System.Net.Sockets.SocketException' occurred in System.dll

    A first chance exception of type 'System.Net.Sockets.SocketException' occurred in System.dll

    TCP Error Code: 10060

    The same happens when we replace “localhost” with “127.0.0.1”

    Connecting to that port on the controller using telnet from another PC works fine, so it doesn't seem to be the c++ host application.


    When I open a Telnet connection to the controller to use it's internal command prompt:

    PING localhost works

    PING 127.0.0.1 fails error code 11010

    Tuesday, December 16, 2014 2:19 AM
  • I have been running platform builder debugger overnight and have hit a few unknown debugchk breakpoints in private code, after which the localhost interface seemed to go down.

    59727568 PID:400002 TID:2580002 Unknown: DEBUGCHK failed in file d:\bt\1786\private\winceos\net\netio\inc\netiobuffer.h at line 1083 

    After pressing 'Continue', I immediately hit

    59727591 PID:400002 TID:2580002 Unknown: DEBUGCHK failed in file d:\bt\2262\private\winceos\net\ndis\sys\ndisbuf.c at line 1339 

    And then all debug output from my native and managed application stopped and the display stopped updating indicating quite a serious crash on the application side, the OS however still seems to be functioning normally.

    Does anyone have an idea what these two DEBUGCHK messages mean or how I could go about tracking it down?

    Thursday, February 26, 2015 7:17 PM
  • As I'm sure you've investigated, 10060 is WSAETIMEDOUT indicating a connection attempt (likely) did not receive a response. Here are the things I'd consider as possible causes:

    * You are opening multiple connections (unintentionally, I presume) and not properly closing them. Eventually you overflow some limit of the network stack and everything crashes. In your native code, make sure any socket connection no longer needed is closed. In your .NET code, call Close() on everything.

    * Something in the .NET or native code is altering the operating network configuration. I don't know what exactly, but you might be unbinding a network adapter.

    * You are using threads to retrieve/send data and are not properly controlling the number of threads operating on a given socket connection causing too much data to be sent or too many open recv() causing a network stack crash.

    Paul T.

    Wednesday, March 4, 2015 5:54 PM
  • I have been doing some more test/debugging with the platform builder debugger attached, this time to stress the loop back interface I have been running a managed application transmitting and receiving large chunks of data through localhost.

    This time I have recorded the Unknown: DEBUGCHK messages that came up and their call stacks:

    //After running several hours:

    23840821 PID:400002 TID:54b0002 DeleteObject of 0x1010c27a failed because object was still in use.
    23841150 PID:5490002 TID:49d000a Written 440000 bytes

    23841397 PID:400002 TID:54b0002 DeleteObject of 0x101ac27e failed because object was still in use.
    23841404 PID:400002 TID:54b0002 Grow Gdi handle table from 49792 to 49800 for the object at 0x994e6870
    23841586 PID:5490002 TID:49e000a Read 380000 bytes

    23841659 PID:400002 TID:54b0002 DeleteObject of 0x1000c285 failed because object was still in use.
    23842086 PID:5490002 TID:49d000a Written 450000 bytes

    23842226 PID:400002 TID:54b0002 DeleteObject of 0x1002c284 failed because object was still in use.
    23842432 PID:5490002 TID:49e000a Read 390000 bytes

    23842487 PID:400002 TID:54b0002 DeleteObject of 0x1004c280 failed because object was still in use.
    23842910 PID:5490002 TID:49d000a Written 460000 bytes

    23843155 PID:400002 TID:54b0002 DeleteObject of 0x1006c283 failed because object was still in use.
    23843380 PID:400002 TID:2dc0006 Unknown: DEBUGCHK failed in file d:\bt\2262\private\winceos\net\ndis\sys\ndisbuf.c at line 1339 

    //Call stack:
    NDIS!NdisAdvanceNetBufferDataStart(_NET_BUFFER * 0xa39c6c70, unsigned long 0x00000028, unsigned char 0x00, void (_MDL *) 0x00000000)  line 1339 + 38 bytes
    TCPIP!NetioAdvanceNetBuffer(_NET_BUFFER * 0xa39c6c70, unsigned long 0x00000028)  line 1000
    TCPIP!IppReceiveHeadersHelper(_IP_REQUEST_CONTROL_DATA * 0xa2780528, _IP_PROTOCOL * 0xef59af98, _IP_GENERIC_LIST * 0xa2a8fd60, _IP_GENERIC_LIST * 0xa2a8fd18, _IP_GENERIC_LIST 

    * 0xa2a8fd38, _IP_GENERIC_LIST * 0xa2a8fcf8)  line 736
    TCPIP!IppReceiveHeaderBatch(_IP_PROTOCOL * 0xef59af98, _IP_GENERIC_LIST * 0xa2a8fd88)  line 1305
    TCPIP!IppLoopbackTransmit(_CEDEVICE_OBJECT * 0x00000000, void * 0xef59af98)  line 664
    TCPIP!IoExecuteWorkItem(CTEEvent * 0xa23e0c20, void * 0xa23e0c20)  line 131
    K.CXPORT!CTEpWorkerThread(void * 0x00000002)  line 746
    K.COREDLL!ThreadBaseFunc(unsigned long (void *) 0xef8b4d21, void * 0x00000002)  line 1269 + 6 bytes


    //After hitting continue, we immediately got:
    23843401 PID:400002 TID:2dc0006 Unknown: DEBUGCHK failed in file d:\bt\1786\private\winceos\net\netio\network\sys\subr.c at line 1949 

    //Call stack:
    TCPIP!IppCompleteAndFreePacketList(_IP_REQUEST_CONTROL_DATA * 0x00000000, unsigned char 0x00)  line 1949 + 28 bytes
    TCPIP!IppReceiveHeaderBatch(_IP_PROTOCOL * 0xef59af98, _IP_GENERIC_LIST * 0xa2a8fd88)  line 1533
    TCPIP!IppLoopbackTransmit(_CEDEVICE_OBJECT * 0x00000000, void * 0xef59af98)  line 664
    TCPIP!IoExecuteWorkItem(CTEEvent * 0xa23e0c20, void * 0xa23e0c20)  line 131
    K.CXPORT!CTEpWorkerThread(void * 0x00000002)  line 746
    K.COREDLL!ThreadBaseFunc(unsigned long (void *) 0xef8b4d21, void * 0x00000002)  line 1269 + 6 bytes

    //After hitting continue, we immediatley got:
    23843404 PID:400002 TID:2dc0006 Unknown: DEBUGCHK failed in file d:\bt\1301\private\winceos\net\netio\sys\netiobuffer.c at line 698 

    //Call stack:
    NETIO!NetioDereferenceNetBufferListChain(_NET_BUFFER_LIST * 0xa16bedb0, unsigned char 0x00)  line 698 + 28 bytes
    TCPIP!IppCompleteAndFreePacketList(_IP_REQUEST_CONTROL_DATA * 0x00000000, unsigned char 0x00)  line 1960
    TCPIP!IppReceiveHeaderBatch(_IP_PROTOCOL * 0xef59af98, _IP_GENERIC_LIST * 0xa2a8fd88)  line 1533
    TCPIP!IppLoopbackTransmit(_CEDEVICE_OBJECT * 0x00000000, void * 0xef59af98)  line 664
    TCPIP!IoExecuteWorkItem(CTEEvent * 0xa23e0c20, void * 0xa23e0c20)  line 131
    K.CXPORT!CTEpWorkerThread(void * 0x00000002)  line 746
    K.COREDLL!ThreadBaseFunc(unsigned long (void *) 0xef8b4d21, void * 0x00000002)  line 1269 + 6 bytes

    //After hitting continue the same debugchk got hit again:
    23843409 PID:400002 TID:2dc0006 Unknown: DEBUGCHK failed in file d:\bt\1301\private\winceos\net\netio\sys\netiobuffer.c at line 698 

    //Call stack:
    NETIO!NetioDereferenceNetBufferListChain(_NET_BUFFER_LIST * 0xa38f81a0, unsigned char 0x00)  line 698 + 28 bytes
    TCPIP!IppCompleteAndFreePacketList(_IP_REQUEST_CONTROL_DATA * 0x00000000, unsigned char 0x00)  line 1960
    TCPIP!IppReceiveHeaderBatch(_IP_PROTOCOL * 0xef59af98, _IP_GENERIC_LIST * 0xa2a8fd88)  line 1533
    TCPIP!IppLoopbackTransmit(_CEDEVICE_OBJECT * 0x00000000, void * 0xef59af98)  line 664
    TCPIP!IoExecuteWorkItem(CTEEvent * 0xa23e0c20, void * 0xa23e0c20)  line 131
    K.CXPORT!CTEpWorkerThread(void * 0x00000002)  line 746
    K.COREDLL!ThreadBaseFunc(unsigned long (void *) 0xef8b4d21, void * 0x00000002)  line 1269 + 6 bytes

    //This kept repeating over and over about 40 times...

    //Then we got an error:
    23843488 PID:400002 TID:2dc0006 Unknown: DEBUGCHK failed in file d:\bt\2262\private\winceos\net\ndis\sys\ndisbuf.c at line 1339 

    //Call stack:
    //Didn't get the call stack for this one

    //After hitting continue we immediately got:
    23843490 PID:400002 TID:2dc0006 Unknown: DEBUGCHK failed in file d:\bt\1786\private\winceos\net\netio\network\sys\subr.c at line 1949 

    //Call stack:
    TCPIP!IppCompleteAndFreePacketList(_IP_REQUEST_CONTROL_DATA * 0x00000000, unsigned char 0x00)  line 1949 + 28 bytes
    TCPIP!IppReceiveHeaderBatch(_IP_PROTOCOL * 0xef59af98, _IP_GENERIC_LIST * 0xa2a8fd88)  line 1533
    TCPIP!IppLoopbackTransmit(_CEDEVICE_OBJECT * 0x00000000, void * 0xef59af98)  line 664
    TCPIP!IoExecuteWorkItem(CTEEvent * 0xa23e0c20, void * 0xa23e0c20)  line 131
    K.CXPORT!CTEpWorkerThread(void * 0x00000002)  line 746
    K.COREDLL!ThreadBaseFunc(unsigned long (void *) 0xef8b4d21, void * 0x00000002)  line 1269 + 6 bytes

    //After hitting continue we got:
    23843492 PID:400002 TID:2dc0006 Unknown: DEBUGCHK failed in file d:\bt\1301\private\winceos\net\netio\sys\netiobuffer.c at line 698 

    //Call stack:
    NETIO!NetioDereferenceNetBufferListChain(_NET_BUFFER_LIST * 0xa39c6bf0, unsigned char 0x00)  line 698 + 28 bytes
    TCPIP!IppCompleteAndFreePacketList(_IP_REQUEST_CONTROL_DATA * 0x00000000, unsigned char 0x00)  line 1960
    TCPIP!IppReceiveHeaderBatch(_IP_PROTOCOL * 0xef59af98, _IP_GENERIC_LIST * 0xa2a8fd88)  line 1533
    TCPIP!IppLoopbackTransmit(_CEDEVICE_OBJECT * 0x00000000, void * 0xef59af98)  line 664
    TCPIP!IoExecuteWorkItem(CTEEvent * 0xa23e0c20, void * 0xa23e0c20)  line 131
    K.CXPORT!CTEpWorkerThread(void * 0x00000002)  line 746
    K.COREDLL!ThreadBaseFunc(unsigned long (void *) 0xef8b4d21, void * 0x00000002)  line 1269 + 6 bytes

    //This kept repeating over and over about 40 times...

    //We then got a first chance exception in TCPIP.dll
    23843680 PID:400002 TID:2dc0006 Exception 'Data Abort' (0x4): Thread-Id=02dc0006(pth=9544b880), Proc-Id=00400002(pprc=8429ffa0) 'NK.EXE', VM-active=05490002(pprc=95479b50) 'GuiNetTest.exe'
    23843680 PID:400002 TID:2dc0006 PC=ef4bf373(tcpip.dll+0x0002f373) RA=ef4bfe71(tcpip.dll+0x0002fe71) SP=a2a8fc60, BVA=00000004

    //After pressing yes to 'pass exception to program being debugged' twice the program seemed to continue:

    23843682 PID:400002 TID:2dc0006 RtlDispatchException: returning failure. Flags=0
    23843692 PID:400002 TID:2dc0006 
    Unhandled exception c0000005:
    23843692 PID:400002 TID:2dc0006 Secondary thread in proc 00400002 faulted, Exception code = c0000005, Exception Address = ef4bf373!
    23843693 PID:400002 TID:2dc0006 Terminating thread 9544b880
    23843738 PID:400002 TID:54b0002 DeleteObject of 0x1008c287 failed because object was still in use.
    23844459 PID:5490002 TID:49e000a Read 410000 bytes

    23844511 PID:400002 TID:54b0002 DeleteObject of 0x100cc281 failed because object was still in use.
    23845217 PID:5490002 TID:49e000a Read 420000 bytes

    23845354 PID:400002 TID:54b0002 DeleteObject of 0x1010c282 failed because object was still in use.
    23845949 PID:5490002 TID:49e000a Read 430000 bytes

    23845995 PID:400002 TID:54b0002 DeleteObject of 0x101ac286 failed because object was still in use.
    23846003 PID:400002 TID:54b0002 Grow Gdi handle table from 49800 to 49808 for the object at 0x994e6870
    23846786 PID:5490002 TID:49e000a Read 440000 bytes

    23846848 PID:400002 TID:54b0002 DeleteObject of 0x1000c28d failed because object was still in use.
    23847495 PID:5490002 TID:49e000a Read 450000 bytes

    23847694 PID:400002 TID:54b0002 DeleteObject of 0x1002c28c failed because object was still in use.
    23848184 PID:5490002 TID:49e000a Read 460000 bytes

    23848336 PID:400002 TID:54b0002 DeleteObject of 0x1004c288 failed because object was still in use.
    23848558 PID:5490002 TID:49e000a Error: data lost

    23848777 PID:400002 TID:54b0002 DeleteObject of 0x1006c28b failed because object was still in use.
    23849241 PID:5490002 TID:49e000a Read 10000 bytes

    23849255 PID:5490002 TID:49e000a Error: data lost

    //Then the C# applciation crashed:

    23864866 PID:400002 TID:49d000a AFD:NtStatusToSocketError unhandled error code 0x274C (10060)
    23864883 PID:5490002 TID:49d000a GuiNetTest.exe
    23864883 PID:5490002 TID:49d000a Error
    23864883 PID:5490002 TID:49d000a An unexpected error has occurred in GuiNetTest.exe.
    Select Quit and then restart this program, or select Details for more information.

    The error message for this exception isn'?t currently available on this device.
    23864884 PID:5490002 TID:49d000a GuiNetTest.exe
    SocketException
    The error message for this exception isn'?t currently available on this device.

       at System.Net.Sockets.Socket.ConnectNoCheck(EndPoint remoteEP)
       at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
       at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP)
       at System.Net.Sockets.TcpClient.Connect(String hostname, Int32 port)
       at GuiNetTest.SocketTester.Client()
       at GuiNetTest.SocketTester.start()
       at System.Threading.ThreadHelper.ThreadStartHelper(ThreadHelper t)
       at System.Threading.ThreadHelper.ThreadStartHelper()
    23864896 PID:400002 TID:49d000a DlgMgr:NextControl[uFlags=14] returning 0x00000000
    23864905 PID:400002 TID:49d000a DlgMgr:NextControl[uFlags=12] returning 0x00000000
    23864905 PID:400002 TID:49d000a DlgMgr: FindDlgItem id 1 returning NULL.
    23865467 PID:5490002 TID:5700006 Socket: Leaked socket being cleaned up by finalizer.
    23865509 PID:5490002 TID:5700006 Socket: Leaked socket being cleaned up by finalizer.
    23865671 PID:400002 TID:108015a *PM: PmDisconnectConn: Socket 0xA23DB3C0 Disconnect call to TLN returned C000020D


    • Edited by tomleijen Wednesday, March 4, 2015 8:17 PM
    Wednesday, March 4, 2015 7:09 PM
  • Hi Paul,

    Thanks for your suggestions, I have attached the source code of the test application I used for my latest tests:

    static class Program
    	{
    
    		public static Form1 form1;
    		/// <summary>
    		/// The main entry point for the application.
    		/// </summary>
    		[MTAThread]
    		static void Main()
    		{
    			SocketTester t1 = new SocketTester();
    			SocketTester t2 = new SocketTester();
    
    
    			Thread serv = new Thread(new ThreadStart(t1.start));
    			Thread clie = new Thread(new ThreadStart(t2.start));
    			
    
    			form1 = new Form1();
    			serv.Start();
    			clie.Start();
    			Application.Run(form1);
    
    		}
    
    	}
    
    	public class SocketTester
    	{
    		public void start()
    		{
    			var listener = new TcpListener(IPAddress.Any, 39672);
    			try
    			{
    				listener.Start();
    			}
    			catch (SocketException e)
    			{
    				if (e.ErrorCode == 10048)
    				{
    					Client();
    					return;
    				}
    				throw;
    			}
    			Server(listener);
    		}
    
    		static void Server(TcpListener listener)
    		{
    			while (true)
    			{
    				var client = listener.AcceptTcpClient();
    				int expected = 0;
    				int numBytes = 0;
    				while (true)
    				{
    					int actual = client.GetStream().ReadByte();
    					if (actual != expected)
    					{
    						Program.form1.AddMessage("Error: data lost");
    						Console.WriteLine("Error: data lost");
    						break;
    					}
    					expected = (expected + 1) % 256;
    					numBytes++;
    					if (numBytes % 10000 == 0)
    					{
    						Program.form1.AddMessage("Read " + numBytes + " bytes");
    						Console.WriteLine("Read {0} bytes", numBytes);
    					}
    				}
    			}
    			listener.Stop();
    		}
    
    		static void Client()
    		{
    			int[] bufferSizes = { 100, 1000, 4000 };
    			byte[] buffer = new byte[bufferSizes.Max()];
    			while (true)
    			{
    				using (var client = new TcpClient())
    				{
    					client.Connect("127.0.0.1", 39672);
    					int data = 0;
    					int numBytes = 0;
    					int bufferSizeIndex = 0;
    					DateTime end = DateTime.Now.AddMinutes(1);
    					while (DateTime.Now < end)
    					{
    						for (int i = 0; i < bufferSizes[bufferSizeIndex]; i++)
    						{
    							buffer[i] = (byte)data;
    							data = (data + 1) % 256;
    							numBytes++;
    							if (numBytes % 10000 == 0)
    							{
    								Program.form1.AddMessage("Written " + numBytes + " bytes");
    								Console.WriteLine("Written {0} bytes", numBytes);
    							}
    						}
    						client.GetStream().Write(buffer, 0, bufferSizes[bufferSizeIndex]);
    						bufferSizeIndex = (bufferSizeIndex + 1) % bufferSizes.Length;
    					}
    				}
    			}
    		}
    	}

    I don't see how it would open multiple connections or multiple threads.

    I also can't see how the .net code would be altering the operating network configuration unless it's some bug in the Microsoft private code.

    Wednesday, March 4, 2015 7:45 PM
  • Hi Paul,

    I thought I should note that the 10060 error only came up when I attempted to (manually) restart the application after it had crashed in the first place.

    I just tried to restart my test application and it came up with the same 10060 error again:

    24434720 PID:1ea009e TID:5f40112 OSAXST1: >>> Loading Module 'rsaenh.dll' (0x9541BA30) at address 0x43370000-0x43393000 in Process 'GuiNetTest.exe' (0x95458280)
    24435693 PID:400002 TID:5f40112 DeleteObject of 0x1012000c failed because object was still in use.
    24435920 PID:1ea009e TID:582000a palFile_OpenModule: CeOpenModuleByPolicyEx() failed with hr=0x80004005
    24435924 PID:1ea009e TID:582000a palFile_OpenModule: CeOpenModuleByPolicyEx() failed with hr=0x80004005
    24435928 PID:1ea009e TID:582000a palFile_OpenModule: CeOpenModuleByPolicyEx() failed with hr=0x80004005
    24435932 PID:1ea009e TID:582000a palFile_OpenModule: CeOpenModuleByPolicyEx() failed with hr=0x80004005
    24436208 PID:1ea009e TID:582000a OSAXST1: >>> Loading Module 'nspm.dll' (0x9545AF00) at address 0x40DB0000-0x40DB8000 in Process 'GuiNetTest.exe' (0x95458280)

    24457314 PID:400002 TID:582000a AFD:NtStatusToSocketError unhandled error code 0x274C (10060)

    Wednesday, March 4, 2015 8:14 PM
  • I'd suggest dropping TcpClient and TcpListener and move closer to the bare WinSock with Socket. I've not used the TcpClient objects but I've never had problems going lower-level. Not much help on the actual question but might get you moving forward again.

    Paul T.

    Monday, March 9, 2015 5:57 PM
  • After applying the latest QFE - the one that was released to fix the problems with interrupts not being serviced, this issue has been resolved.
    • Marked as answer by tomleijen Sunday, May 3, 2015 8:22 PM
    Sunday, May 3, 2015 8:22 PM