locked
Question about service performance consistency: High peaks RRS feed

  • Question

  • Hi all,

    I am currently evaluating the Azure IoT Hub for potential applications in our company. A couple of these applications would require a stable, consistent 1Hz data stream and one of the questions I want to evaluate is, whether this is possible with the IoT Hub.

    For this I wrote a small tool that basically sends a message to the IoT Hub every 15s, receives it as a client and then measures the time it took for whole roundtrip. This revealed mostly positive results, i.e. the average time in ms is very good, at least for our applications. However, there were points in time where the performance of the IoT Hub, either sender or receiver, experienced serious hiccups, i.e. up to a minute or more.

    After that initial test, I extended the tool to output the individual measurements. Just looking at the "current" delay plotted over about 2 days of measuring, it looks like this:

    BTW, I know that our local company network could have as much to do with the issue as Azure, which is why this plot was generated from data that was acquired by running the test tool on an Azure VM in the same region. At the very least I wanted to eliminate the company firewall from the equation. :)

    And while this plot is missing abysmal results like 60+ seconds, there are still instances where a simple 100 byte message took several seconds (max. 9s) to make the roundtrip from Azure VM --> IoT Hub --> Azure VM.

    I will not claim that I am an expert on internet connections, speed, etc. -- I just want to understand if this is an issue that other customers are seeing as well, if it is something that is a known issue regarding the Azure IoT Hub (or even across all Azure Services), if it is something that maybe could be improved by certain conditions, etc.

    So any official word from a Microsoft employee or help from Azure experts would be appreciated. :)

    Cheers,

    Philip


    Monday, June 20, 2016 12:41 PM

Answers

  • Just to follow up on this. Through a Microsoft Support ticket, the spikes in delays were investigated and confirmed by Microsoft. There was no technical explanation given for the spikes, just that on the basis of the Service Level Agreement of 99.9% service availability, these delays/spikes per day fall into the 0.1% portion.

    While this may sound a little bit condescending, I understand their reasoning. They said that SLA should increase in the future, so that delays and spikes would also become less likely.

    Thursday, September 8, 2016 11:47 AM

All replies

  • Hi Philip,

    Could you share this tools sources with us so we can elaborate further on your results and their root causes?


    Please mark answered question as anwered to let others know about it.

    Wednesday, June 29, 2016 7:52 PM
  • Hi Valery,

    thanks for your reply. I have uploaded the sources to this location (note that files will be removed after 7 days):

    http://internet-tdrs.trimble.com/TDRS/outgoing/81970528086-6c89feae882f/IoT-DelayMonitor.zip

    It is a small C# (.NET 4.6) test tool. You need to configure the IoT Hub endpoint (hostname, owner and shared key) in the application's config file before you can use it.

    Cheers,

    Philip


    Friday, July 1, 2016 7:14 AM
  • Thanks a lot for sharing. I will give it a spin asap and see if I can help explaining spikes.

    Would you consider putting this code on Github eventually?


    Please mark answered question as anwered to let others know about it.

    Friday, July 1, 2016 7:20 AM
  • Just to follow up on this. Through a Microsoft Support ticket, the spikes in delays were investigated and confirmed by Microsoft. There was no technical explanation given for the spikes, just that on the basis of the Service Level Agreement of 99.9% service availability, these delays/spikes per day fall into the 0.1% portion.

    While this may sound a little bit condescending, I understand their reasoning. They said that SLA should increase in the future, so that delays and spikes would also become less likely.

    Thursday, September 8, 2016 11:47 AM
  • Hi Philip,

    We're going through what it seems to be a similar situation. Would you mind sharing your test tool again?

    Cheers

    Monday, September 26, 2016 10:42 AM