locked
Azure IoT Edge: Orchestration and Load Balancing RRS feed

  • Question

  • Hello, 

    we would like to better understand the orchestration/deployment and load balancing mechanisms provided by Azure IoT Edge. 

    We are aware of the DPS which focuses on the edge device to IoT Hub provisioning.

    Wa are also aware of the automatic deployment functionality (see here: msdn with azure/iot-edge/module-deployment-monitoring) which deploys according to a specified manifest and continuously evaluates the deployment and adjusts (onboard new device, remove device from deployment, etc.). 

    The following questions we could not find in any kind of MSDN resource documenation (or by browsing the Azure IoT Edge runtime sources):

    1. Which metrics can be specified at all for the target conditions which are being evaluated for deployment? What about memory or CPU / GPU metrics? Can these be incorporated?

    2. We could not find anything regarding dynamic load balacing mechanisms? What happens if a node reaches its resource limits? What happens if the node crashes/becomes unavailable? Are there any workload balancing schemes implemented or in progress? (By load balancing we mean across node boundaries)

    Thanks for any help or resource which in detail explains the behavior!

    Regards Matous



    • Edited by Matous Monday, March 9, 2020 1:37 PM
    Monday, March 9, 2020 1:36 PM

All replies

  • Hello Matous,

    Welcome to MSDN and thank you for your very good question in advance!

    Let me first tackle your questions directly and we can extend the discussion if needed:

    1. Which metrics can be specified at all for the target conditions which are being evaluated for deployment? What about memory or CPU / GPU metrics? Can these be incorporated?

    You found the right documentation, let me add the links:

    Any device-reported metric in the device twin can be used as the target condition. "In device twin, you can build a target condition using tags, reported properties, or deviceId." Therefore if your edge device reports it's CPU\GPU metrics in the device twin, you can use it as a target condition.

    > 2. We could not find anything regarding dynamic load balacing mechanisms? What happens if a node reaches its resource limits? What happens if the node crashes/becomes unavailable? Are there any workload balancing schemes implemented or in progress? (By load balancing we mean across node boundaries)

    If I can understand the second question well, you want to have a way to "notify" the IoTHub that the iot edge device can no longer host a specific module due to CPU\GPU\Memory constraints? If that's the case I believe that you will need to follow Rollback instructions:

    "Deployments can be rolled back if you receive errors or misconfigurations. Because a deployment defines the absolute module configuration for an IoT Edge device, an additional deployment must also be targeted to the same device at a lower priority even if the goal is to remove all modules. "

    Hope I could tackle some of your initial questions. Let me know your next questions and sorry if I have misunderstood some.

    Thanks!


    Wednesday, March 11, 2020 4:00 PM
  • Hello Matous,

    Please could you share with us your next questions or mark the above as answer if no further questions.

    Thank you!

    ___________________________________________________________________

    When you see answers and helpful posts, please click Vote As Helpful, Propose As Answer, and/or Mark As Answer so that other customers can benefit from it.

    Monday, March 16, 2020 9:54 AM