locked
IoT hub cross region HA RRS feed

  • Question

  • We are  investigating rearchitecting a self hosted solution and migrating it to Azure. Currently we use an active/active geographically redundant co-location setup.

    We can fail-over entirely to one of the locations, and our RTO < 1 second.

    While investigating IoT hub, it seems that the RTO can be 2~26 hours. In the IoT world that I know, that kind of time period would be a total disaster. I cannot believe such a critical ingress point isn't cross-region HA by default, and that such high RTO figures are stated. I can see it working for telemetry metering, but not much more than that.

    Also, in the IoT world that I know (GPS Tracking), the IoT devices we use only support UDP or TCP bespoke binary protocols. They don't use MQTT. I have come across one that does support MQTT, but it doesn't support SSL/TLS (an IoT hub mandatory requirement).

    We can work around the protocol issue using an edge gateway, but the issue of cross region HA remains.

    Cross region also seems to reply on a concierge service, duplicated hubs with manual interaction. We can't modify the IoT device firmware, so that seems to be a non-starter.

    I can't believe any business could tolerate their IoT platform being offline for 24 hours, so there must be a straight forward solution to this that deals with the IoT devices out there on the market. Currently, I would estimate that 99.99% of vehicle tracking hardware does not support anything other than bespoke TCP or UDP binary communications.

    Can anyone point me to some articles or offer guidance in people who have found solutions to this type of problem?

    Currently, the only way we can think of getting this to work is to write bespoke TCP / UDP handlers that run inside a service fabric cluster, deployed across availability zones.


    MikeM

    Saturday, June 22, 2019 2:16 PM

All replies

  • Hi Mike,

    If your business uptime goals aren't satisfied by the RTO that Microsoft initiated failover provides, you should consider using manual failover to trigger the failover process yourself. The RTO using this option could be anywhere between 10 minutes to a couple of hours. The RTO is currently a function of the number of devices registered against the IoT hub instance being failed over. You can expect the RTO for a hub hosting approximately 100,000 devices to be in the ballpark of 15 minutes.

    If your business uptime goals aren't satisfied by the RTO that either Microsoft-initiated failover or manual failover options provide, you should consider implementing a per-device automatic cross region failover mechanism.

    Please check the documentation IoT Hub high availability and disaster recovery for more details and see if it helps.

    I understand that you can’t change the firmware, Suggest you to provide a feedback on UserVoice forum and upvote.

    All the feedback you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure.

    Wednesday, June 26, 2019 11:37 AM
  • Hi,

    We need sub-5 second failover, and RTO is just one issue that we have with IoT hub.

    Thanks

    Mike


    MikeM

    Wednesday, June 26, 2019 11:41 AM
  • As mentioned earlier, please create a feedback item on the UserVoice forum and provide your feedback.
    Monday, July 1, 2019 8:20 AM