locked
Lost Connection Advice RRS feed

  • Question

  • I have an application that connects to a tcpip/rs422 protocol converter. The converter is attached to some electronic equipment. This application polls the equipment at the other end of the converter and receives responses from the connected equipment. This application is used in conjunction with this converter in many locations without a problem. (Over internet and intranet.) However, in an installation in Mexico we are having problems. Sometimes the connection will work for several hours and then unexplainably the connection is lost. I have found out that their WAN is supported by satellite communication which I suspect may be part of the problem. The application is running on a Windows XP SP3 computer.

    I am being sent to diagnose the connection problem. I am wondering if the netmon tool would be helpful in diagnosing the problem? If so, what would I look for or what type of setup do I need to capture the information needed to diagnose the problem? I haven't used this tool before at all so I am a newbie.

    Thank you.
    Thursday, October 8, 2009 2:53 PM

All replies

  • Taking a network trace will help you understand if there's a networking issue.  If the WAN is involved, then you should be able to tell if there is a timeout due to lost or slow packets at the TCP layer.  The trick will be understanding where in your environment you'll be able to get a trace.

    One obvoius point is at the client.  From here you might be able to tell if packet loss is a problem.  You could check to see if you have retransmits (using the filter "property.tcpretransmit==1") and then follow the traffic for these to determine that a session has timedout.  Once you find a retransmit, you could look at each conversation (right click the frame, Find Conversation->TCP).  You'll have to remove the filter, but once you do you can see what happens at the end of the connection.  A common problem will be you'll see a doubling of the Time Delta (this is a column you can add), and then finally the connection will be reset.  This would be an indication of a network problem, perhaps due to the satalite or something else.

    It might also be helpful to get a trace at the point where the TCPIP/Rs422 conversion takes place.  This might be more difficult because of how it's connected to the ethernet network.  However you might be able to span a port on a switch to see this traffic.  Again, you could look a this traffic again look for a similar pattern.

    Getting the traces is the easy part, but understanding what is happening is less straight foward.  Unfortunately it's hard to get very general advice, but the above is a good start.

    Paul

    Thursday, October 8, 2009 5:19 PM
  • Paul,

    Thanks for your advice. I have been trying the monitor program here locally to get familiar with it. I have a local test connection I am monitoring that is connected to a protocol converter also on my lan which, of course, works without a problem. I did find the re-transmit filter and applied it but in my simple test environment nothing was re-transmitted.

    Is there a filter to locate the actual reset of the connection?

    Also, I should mention that there are some converters (there are 50 total connections going to about 50 cities) that are working perfectly. If I compared traffic between one that is working and one that stops would that be helpful?

    Thanks again. I realize this is very vague and I wish I had a network specialist on staff I could send.
    Thursday, October 8, 2009 6:38 PM
  • To see all resets you can create a filter of "tcp.flags.resest==1".  Remember that you can also create color filters for any of these mentioned above if you prefer.  The only problem with looking at resets, is that most connections are reset when they are closed.  So this may not be a good indication of a problem.  But if you inspect every reset, you will probably see one that is related.

    Comparing to a working a trace is a good idea.  If there's something else going on this could give you some insight.

    Perhaps another way to look at this is to use the converastion tree and inspect every TCP conversation that shows up.  You could look a the end of a conversation to see if there's something that occured there that looks fishy.  If you add the "Time Delta" column, you should be able to see a pttern like (2,4,8,16) for time delta's which indicates that a connection has been lost normally to network problems.

    Friday, October 9, 2009 6:02 PM