Problem using Windows as a Router RRS feed

  • Question

  • As usual I am not sure if this is the correct forum for this question; also, it got pretty long in order to cover what I have done so far.  Please bear with me.

    I am building a system to transfer IP packets over an experimental transport system.  I have developed a NDIS driver to take packets from the bottom of the protocol stack and deliver them to user mode code, the packets are then routed through our mechanism and at the receiving end the NDIS driver presents them to Windows as received packets.  All of this has been working well on development hardware for a couple of months now (thanks to some excellent help from this forum.)

    Now the deployment hardware is available and I have encountered a problem.  In order to provide communications services to several computers at each end of our link, we run the Windows 7 machine hosting our NDIS driver as a router.  On the development systems (commercial, off the shelf, PC’s), everything works fine.  On the deployment hardware, Windows drops about 5% of the packets.  I say Windows does it because I can see the system receive them on Wireshark, but they are never presented to my driver.  

    The data rates are pretty low, about 500 kbps.  The packet sizes are about 1400 bytes.  The test configuration is client computer -> Ethernet link-> router -> experimental link -> router -> Ethernet link -> client computer.  The routers are only doing routing; other than diagnostic displays.  The router systems are built by Kontron, they have Atom processors running at 1.6 GHz, with 1 GB of memory.  The systems run Windows 7 Embedded.

    After a number of experiments I know the following:

    1) Running the same test on a similarly configured systems running full Windows 7 Enterprise on a Core Duo process running at 3.16 GHz with 4 GB of memory does not have the problem.  (This is my development system.)

    2) Replacing the Windows Embedded in the deployment system with Windows 7 Enterprise does not fix the problem.

    3) If I remove the experimental link and replace it with another Ethernet link, the problem does not occur.

    4) If I pace the packets at about 1 per 20 ms the problem does not occur.

    5) I set  HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\NumForwardPackets to 512 and HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ForwardBufferMemory to 296192.  This did not help any at all on the Windows embedded configuration.  On Windows enterprise configuration this cut the packet loss rate to about 2%.

    Given item 3 above, I would assumed that my driver is not fast enough, so I removed all of the logging, ran the optimizer on it and gave it a large number of buffers to use in getting packets out of the OS (1000 buffers).  This produced no improvement.

    Item 3 in my list seems to show that the basic hardware is fast enough to do the job, but the only difference I can see between the system that works and the one that does not is the CPU speed and amount of memory.

    I am interested in any suggestions anyone might have.
    Tuesday, July 23, 2013 7:47 PM

All replies

  • So to make sure I understand correctly: you believe that the software+hardware in your router ought to be able to scale up to this traffic load, but it is not.  The first step here is to find the bottleneck.  I can't guess where it is, but I can tell you a few of the things I'd check into to see about identifying the bottleneck.

    Are any CPUs pegged near 100%?  Is any system showing very little free memory?  Is anybody hitting the disk for any reason (I hope not; networking should not hit the disk)?  What is the send/receive throughput measured at each end of each of the network links?  (If one sender is seeing significantly higher throughput than his corresponding receiver, there may be something going on at layer 2).  Are any NICs reporting error packets (drop, underflow, overrun, collisions)?

    Are packets queuing up somewhere?  Check your NDIS drivers' send and receive queue depth to see if the depth gets unreasonably large.  (At 500kbps, you shouldn't need to have more than a dozen packets queued anywhere).

    After finding the bottleneck, you can think about ways to improve scalability.  The two most common ones that I can think of are disabling unneeded OS services and improving the user-kernel datapath.

    Is the firewall running on the routers?  If so, consider turning it off; you'll get a perf boost.  (Turning off a firewall is an obvious security red-flag, but if the routers are used in a controlled experimental environment, you might not need the firewall.)

    I gather that your NDIS driver ships packets to usermode for some processing.  Is this user-kernel interface asynchronous?  Does it batch more than one packet at a time?  A synchronous, 1-at-a-time interface is easy to work with, but it won't scale as well as an asynchronous, batching interface.  If you're not meeting performance goals, you may need to pay the complexity cost of reworking the user-kernel datapath.

    Tuesday, July 23, 2013 8:47 PM
  • Yes, it certainly seems like the system SHOULD be able to handle the data rate.

    I have done much of this checking.  The CPUs are running at about 30%.  Of the 1gb of memory 300mb is "in use" the rest is "standby".  These is a little disk I/O going on that has to do with instrumenting the system, but it's pretty small.

    The only place I expect packets to queue is in user mode.  The experimental link is half duplex with long turn around times, so packets must queue in the sending system waiting for the link.  However, the Windows IP stack should be unaware of this, the NDIS driver completes the operation as soon as the packet is handed to the user mode code.

    I am currently running a test program that is attempting to send 400 KB/s in 1400 byte packets (about 36 packets per second), the receiver just checks for missing packets and counts everything up.  It consistently reports receiving about 400 KB/s with a 5-7% packet loss.  This is only about 2/3 of the capacity of the link, so I don't expect any problems or dropped packets at these rates.  I currently have my link's congestion control turned off and I have some code to do end-to-end assurance of the data, so that I don't drop any packets after I get them.  (I probably can't really run it this way, but it helps for this testing.)

    I have used perfmon to look at the performance counters.  It has a selection for "Outbound queue length" but I don't see anyway to monitor input queue length (I think that would be what to look for). 

    At any rate the output queue length of both adapters (the real Ethernet adapter and my new adapter) always say zero in perfmon.  Is there a better way to measure this information?

    I found no error counters with nonzero values on either interface.  There is a counter for "IPv4 -> Datagrams Received Discarded" that contains a non-zero value.  But it does not seem to be large enough to include all of my missing packets.

    The firewall is off, these systems are isolated.

    The NDIS driver is asynchronous but not batching.  It delivers one packet in each buffer, but the user mode program can submit any number of buffers to it.  I started out running it with 10 buffers, but I have upped the number to as much as 1000 with no real effect on the problem.  Again the low data rates did not seem to need justify more complexity than this.

    I am very interested in measuring the NDIS drivers input queue length and would be interested in any advice on how to do this.  Also, if there is a better performance monitor/reporting program than perfmon I am anxious to try it out.

    Thanks for your advice and help.


    Thursday, July 25, 2013 3:17 PM