locked
40400: Endpoint not found RRS feed

  • Question

  • Hello,

    I am running Service Bus for Windows Server (Service Bus 1.1 ). I am experiencing an issue where a Subscription Client or Queue Client receives the following error when trying to retrieve messages: "40400: Endpoint not found".

    Our setup consists of a farm with 5 nodes. It looks like the app fabric service occasionally crashes on one of the nodes (outside the scope of this question), and then any clients which were using entities managed by the crashed node start experiencing the issue. If i restart the client app (which creates new Subscription/Queue Clients), the queue/subscription work again.

    The issue can be reproduced by stopping the Service Bus Broker service on one of the nodes. Even when restarted, the clients themselves need to be restarted to resolve the issue.

    My understanding from reading the MSDN docs is that if a broker node fails, the containers will be re-distributed amongst the remaining nodes, and clients should be able to continue sending requests to the gateway without being impacted.

    Is my understanding of HA correct here? Does the client have any additional requirements or responsibilites for achieving HA? Or is it something completely unrelated to HA? If anyone could point me in the right direction that would be appreciated.

    Monday, August 7, 2017 4:00 AM

All replies

  • How are you creating your client? Since you have 5 nodes, are you creating client with 5 endpoints? Client will move to next endpoint when current one fails if provided by multiple endpoints.

    When one node crashes, the failures should be intermittent and recover after a short duration, this is probably less than a minute.

    Monday, August 7, 2017 9:11 PM
  • Thankyou for your response.

    Clients are created by calling:

    MessagingFactory.CreateFromConnectionString()

    where the connection string contains the nodes, for example:

    Endpoint=sb://node1/MyNamespace,sb://node2/MyNamespace,sb://node3/MyNamespace;RuntimePort=9354;ManagementPort=9355;SharedAccessKeyName=KeyName;SharedAccessKey=KeyValue

    • Edited by CJ-Bee Monday, August 7, 2017 11:12 PM
    Monday, August 7, 2017 11:12 PM
  • Can you manually test and see how long it takes to recover a node crash? Since your client is configured to talk to multiple nodes, when node A is down it should switch to node B. In your test app, just log exceptions in a receive loop. Eventually client should recover end Receive call should resume reading messages.
    Monday, August 7, 2017 11:19 PM