none
IoT hub gateway cloud to device messages sometimes not being delivered, but device > cloud is fine. RRS feed

  • Question

  • I use this code:


             while (!cts.Token.IsCancellationRequested) 
             {

                  .... log waiting for messages to debug

                   receivedMessage = await deviceClient.ReceiveAsync(TimeSpan.FromSeconds(120));

                   if (receivedMessage == null)
                   {

                      ..... log null received to debug

                      continue;
                   }

                   ..... deal with message

                   deviceClient.CompleteAsync(receivedMessage );

             }

    All that works. I log the NULL messages being received. They occur every 120 seconds, and then it starts listening again.

    Device to IoT hub telemetry has 100% availability throughout the whole process.

    99% of the time, cloud -> devices messages trigger this code above fine. No problems. I can hammer them away and they flow through fine. 1 min, 5 min, 30 min intervals, all fine.

    Periodically, cloud to device messages simply stop being received in the above code. The Device twin reports there are 1 or more pending and the Iot Device is "connected".

    When it is in this state, I can see the timeout above firing, and then it logging it is starting the receiveAsync again. At this point, device -> iot hub messages are still coming in fine (on the same deviceClient connection).

    The only thing that will kick it into life is for my IoT device to drop the socket, and effectively start a new deviceClient on a new thread for device>IotHub and IotHub < device messages. Then they start flowing again. It is as if the deviceClient will no longer work cloud < device direction. However, device >cloud is fine.

    I'm using 1.21.1 and amqp.

    Does anyone have any ideas why this happens?

    Thanks


    MikeM

    Saturday, October 12, 2019 4:06 PM

Answers

  • Ok - I think I am a muppet.

    I never had the issue on my test machine, as that only has one test device connecting in. The service fabric has a lot more.

    I just checked my code, and the deviceClient held inside that context class was for some reason declared as a static. I am assuming that screwed the thread safety aspect and in fact, they all shared the same deviceClient. That means the cloud receive would have been randomly firing in the wrong context class, hence the lock issues I am getting above.

    Hang fire for now - I will continue testing and report back if indeed this schoolboy error was the cause.


    MikeM

    Friday, October 18, 2019 12:07 PM

All replies

  • I am running this process inside a service fabric cluster application. It exhibits this behaviour there.

    However, when I run it in Docker on my development PC, I have yet to see this behaviour, even after several hours testing.

    Is it possible that deviceClient has some kind of issue inside a service fabric application? The device > cloud still works on the same deviceClient IoT hub connection. It's just the cloud to device ReceiveAsync that has the issue.


    MikeM


    Sunday, October 13, 2019 3:51 PM
  • Hi Mike A Sharp I do not see the await keyword in this line?
        await deviceClient.CompleteAsync(receivedMessage);
    


    Friday, October 18, 2019 12:44 AM
    Moderator
  • Hi,

    I had missed that. I have fixed it, although I don't think that is the problem.

    Basically, Im running an IoT gateway in the service fabric. When an IoT connects in over TCP, I create a new class to hold all the data (and IoT deviceClient) relating to that socket as a "context".

    Each socket, along with its context is then added to a synchronised array list. This works fine, and I've never had a problem with safety using this methodology before, outsize azure, even with 20K+ IoT connections.

    I just sent a JSON C2D message to one device, and it worked, then it threw an error "'Microsoft.Azure.Devices.Client.Exceptions.DeviceMessageLockLostException".

    Then following that, the IoT hub sent a message for ANOTHER device to this device, so the message arrived on the wrong deviceClient.

    That suggests to me that the deviceClient library is not thread safe, which in an IoT hub world sound be plainly ridiculous.

    Can someone confirm that the deviceCLient Nuget is actually thread safe?

    thanks


    MikeM

    Friday, October 18, 2019 11:54 AM
  • Ok - I think I am a muppet.

    I never had the issue on my test machine, as that only has one test device connecting in. The service fabric has a lot more.

    I just checked my code, and the deviceClient held inside that context class was for some reason declared as a static. I am assuming that screwed the thread safety aspect and in fact, they all shared the same deviceClient. That means the cloud receive would have been randomly firing in the wrong context class, hence the lock issues I am getting above.

    Hang fire for now - I will continue testing and report back if indeed this schoolboy error was the cause.


    MikeM

    Friday, October 18, 2019 12:07 PM
  • I'm glad to hear that you are making good progress resolving your issue, feel free to open a new question as issues arise thanks for reaching out. Also when running on issues a good source of information is the issues section on the public SDK repository on GitHub.
    Friday, October 18, 2019 5:31 PM
    Moderator