locked
IoT Hub - raw data enrichment / rules RRS feed

  • Question

  • <style type="text/css">p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Helvetica} span.s1 {font-kerning: none} </style>

    Hello,

     

    Enriching, incoming event messages in IoT Hub, with extra data and adding rules can be achieved using Stream Analytics.

    But what is the best/recommended way to achieve this WITHOUT Stream Analytics in (small) IoT solutions? Using Azure Functions: 1) reading the missing data from e.g. an Azure Storage Table and persisting in a db 2) performing the rules and perform the necessary action(s) OR using the same principles with (one or more) EventProcessorHosts?

     

    Many thanks.

     

    Guy

    Sunday, April 2, 2017 7:23 PM

Answers

  • Hi Guy,

    As you mentioned, using the Azure Stream Analytics (ASA) is the best solution for IoT Data processing in the stream pipeline. The ASA has been designed and implemented as a part of the Azure IoT stack.

    The following screen snippet shows an example of this stream pipeline:

    Using the ASA in the stream pipeline is straightforward via the VETER stream pattern (Validate-Enlarge-Transform-Enlarge-Route). For small/medium IoT Solutions it is basically focusing on the input/query/output in the business stream model and using a principle of the stream data parallel processing (see here) in the real-time.

    I can guess, based on your question to replace this significant component in the Azure IoT Stack in the stream pipeline is its price, such as $81.84/month for one unit. So, for the small IoT Solution using one IoT Combo such as IoT Hub-S1 + ASA-1unit, the price is ~ $150.

    The following screen snippet shows an example of the mini data stream pipeline without using an ASA job:

    as you can see, the above picture shows an Azure Function for data stream processing, but similarly can be used also an EventProcessorHost hosted by Worker Role.

    More details about these options is in the following:

    1. Using Azure Function (AFN) in the real-time stream pipeline for some little stream job looks attractive, but
      1. Have a look at here for more details about the AFN Event Hub trigger bindings.
      2. Keep on mind about the AFN cold start, in my environment is between the 5-10 seconds.
      3. The AFN is not free, for instance, handling all events in the IoT Hub – S1 (400k/day) will cost you ~ $40/month when the stream will be stayed in the AFN for 200 milliseconds.  
    2. Using EventProcessorHost object hosted by Worker Role, see more details here. This is a great support for processing events in the real-time manner. It is very well documented. So, for this solution you will need at least 1 instance of the Worker Role (VM). For basic solution, such as a small instance is price $59.52/month.

    I do recommend to use the EventProcessorHost/WorkerRole in the case of ASA replacement in the data stream pipeline. But, before that, have a look at your real-time requirements, IoT Data throughput, business stream model, complexity, deployment and maintenance, incremental development, etc. and of course how much you will save it, when your IoT Stack doesn’t have a Stream Analytics Job.

    Thanks

    Roman





    • Edited by Roman Kiss Tuesday, April 4, 2017 1:45 AM
    • Marked as answer by Guy Dillen Wednesday, April 12, 2017 2:29 PM
    Tuesday, April 4, 2017 1:32 AM
  • Hi Guy,

    -        -  I don’t recommend to use an Azure IoT Hub – Free Edition for QA/Production environment, so for the small IoT Solution should be at least S1 Edition with 1 unit ($50/month). The default number of the EH partitions is 4 (in opposite to the Free Edition, which is 2).

    Note, that the EventProcessorHost has a capability to manage all partitions in the scalable balanced manner. Each partition has own receiver instance. That’s happen in one single Worker Role instance.

    From the hosting point of the view, the minimum resources are started with two instances of the small VM (A1: 1 core, 1.75 GB RAM, 225 GB storage), which it will cost 2 x $60 = $120.

    Note, that using a sharable extra small VM (A0: 1 core, 0.75GB RAM, 20 GB storage) will dramatically save money (~$92), but it will create a performance bottleneck for data stream processing.   

    Based on that, the above solution will have two worker roles, where each worker role will handle all EH partitions (in this model 4). Note, that this solution will require more coding.

    Definitely, for better throughput in the stream pipeline will help using a VM with multiple cores (at least 2 cores). For instance, A2 (medium) model.

     

    As you can see, there is no cheaper alternative to the ASA job ($82) for QA/Production environment with accepting IoT Hub quotas and throttling.

    The following screen snippet shows an example of the IoT small solution without using an ASA job, where IoT Hub Routes can help to minimize a work in the stream pipeline if the business model can have accepted:

    Thanks

    Roman




    • Edited by Roman Kiss Wednesday, April 5, 2017 2:15 AM
    • Marked as answer by Guy Dillen Wednesday, April 12, 2017 2:29 PM
    Tuesday, April 4, 2017 8:45 PM

All replies

  • Hi Guy,

    As you mentioned, using the Azure Stream Analytics (ASA) is the best solution for IoT Data processing in the stream pipeline. The ASA has been designed and implemented as a part of the Azure IoT stack.

    The following screen snippet shows an example of this stream pipeline:

    Using the ASA in the stream pipeline is straightforward via the VETER stream pattern (Validate-Enlarge-Transform-Enlarge-Route). For small/medium IoT Solutions it is basically focusing on the input/query/output in the business stream model and using a principle of the stream data parallel processing (see here) in the real-time.

    I can guess, based on your question to replace this significant component in the Azure IoT Stack in the stream pipeline is its price, such as $81.84/month for one unit. So, for the small IoT Solution using one IoT Combo such as IoT Hub-S1 + ASA-1unit, the price is ~ $150.

    The following screen snippet shows an example of the mini data stream pipeline without using an ASA job:

    as you can see, the above picture shows an Azure Function for data stream processing, but similarly can be used also an EventProcessorHost hosted by Worker Role.

    More details about these options is in the following:

    1. Using Azure Function (AFN) in the real-time stream pipeline for some little stream job looks attractive, but
      1. Have a look at here for more details about the AFN Event Hub trigger bindings.
      2. Keep on mind about the AFN cold start, in my environment is between the 5-10 seconds.
      3. The AFN is not free, for instance, handling all events in the IoT Hub – S1 (400k/day) will cost you ~ $40/month when the stream will be stayed in the AFN for 200 milliseconds.  
    2. Using EventProcessorHost object hosted by Worker Role, see more details here. This is a great support for processing events in the real-time manner. It is very well documented. So, for this solution you will need at least 1 instance of the Worker Role (VM). For basic solution, such as a small instance is price $59.52/month.

    I do recommend to use the EventProcessorHost/WorkerRole in the case of ASA replacement in the data stream pipeline. But, before that, have a look at your real-time requirements, IoT Data throughput, business stream model, complexity, deployment and maintenance, incremental development, etc. and of course how much you will save it, when your IoT Stack doesn’t have a Stream Analytics Job.

    Thanks

    Roman





    • Edited by Roman Kiss Tuesday, April 4, 2017 1:45 AM
    • Marked as answer by Guy Dillen Wednesday, April 12, 2017 2:29 PM
    Tuesday, April 4, 2017 1:32 AM
  • Hi Roman,

    Many thanks for your thorough explanation.

    Indeed, the reason of my question was of course the price of ASA. I used/use ASA in PoCs/demo's (where I can start/stop the ASA job) or in large IoT projects.

    But for small IoT solutions (maybe 4000 IoT hub ingest event messages/day) it's budget wise too costly for small organisations/projects. Although ASA is an easy/efficient approach maybe it's also overkill for these kind of scenarios. That's the reason for an alternative to ASA. I also already used/use Azure Functions and EventProcessorHost (now since recently also available on .NET Core) but your info is a most welcome addition to what I already knew. I have one additional question: I suppose 1 EventProcessorHost for handling all functionality (validation, enrichment, persistence, alerting/or routing) is sufficient for small sized solutions?

    One more thanks.

    Guy

    Tuesday, April 4, 2017 4:25 PM
  • Hi Guy,

    -        -  I don’t recommend to use an Azure IoT Hub – Free Edition for QA/Production environment, so for the small IoT Solution should be at least S1 Edition with 1 unit ($50/month). The default number of the EH partitions is 4 (in opposite to the Free Edition, which is 2).

    Note, that the EventProcessorHost has a capability to manage all partitions in the scalable balanced manner. Each partition has own receiver instance. That’s happen in one single Worker Role instance.

    From the hosting point of the view, the minimum resources are started with two instances of the small VM (A1: 1 core, 1.75 GB RAM, 225 GB storage), which it will cost 2 x $60 = $120.

    Note, that using a sharable extra small VM (A0: 1 core, 0.75GB RAM, 20 GB storage) will dramatically save money (~$92), but it will create a performance bottleneck for data stream processing.   

    Based on that, the above solution will have two worker roles, where each worker role will handle all EH partitions (in this model 4). Note, that this solution will require more coding.

    Definitely, for better throughput in the stream pipeline will help using a VM with multiple cores (at least 2 cores). For instance, A2 (medium) model.

     

    As you can see, there is no cheaper alternative to the ASA job ($82) for QA/Production environment with accepting IoT Hub quotas and throttling.

    The following screen snippet shows an example of the IoT small solution without using an ASA job, where IoT Hub Routes can help to minimize a work in the stream pipeline if the business model can have accepted:

    Thanks

    Roman




    • Edited by Roman Kiss Wednesday, April 5, 2017 2:15 AM
    • Marked as answer by Guy Dillen Wednesday, April 12, 2017 2:29 PM
    Tuesday, April 4, 2017 8:45 PM
  • Hi Roman,

    Once more thanks for the info.

    In a "lite" approach (no ASA) I was thinking for enriching ingested IoT Hub messages, before writing them to a data store and eventual other processing (alerting, ...), to use an Azure Table Storage (a denormalized extract of relevant data coming from a SQLDB). In your second schema of your first answer do you mean with "blob storage" using e.g. a reference (JSON) data file? As another alternative I was eventually thinking of using something like SQLite?

    Thanks.

    Guy

    Wednesday, April 5, 2017 8:15 PM
  • Hi Guy,

    - yes, that's the Reference Data from the Blob Storage.

    I do recommend for your "lite" stream pipeline to keep all inputs/outputs the same like is for ASA job. It will help you during your incremental development and extensions. 

    Thanks

    Roman



    • Edited by Roman Kiss Wednesday, April 5, 2017 9:54 PM
    Wednesday, April 5, 2017 9:53 PM
  • Sorry Roman for the late reply. Many thanks.

    Two supplementary questions:

    - in your schemas you have an "output" from IoT Hub to a Service Bus Queue: is this based on a routing rule in IoT Hub?
    - Note, that the EventProcessorHost has a capability to manage all partitions in the scalable balanced manner. Each partition has own receiver instance. That’s happen in one single Worker Role instance. ... Based on that, the above solution will have two worker roles, where each worker role will handle all EH partitions (in this model 4). Note, that this solution will require more coding. I suppose two worker roles are meant for failover purposes?


    Thanks.

    Guy


    • Edited by Guy Dillen Wednesday, April 12, 2017 2:58 PM
    Wednesday, April 12, 2017 2:29 PM
  • Hi Guy,

    - Yes, your reasoning is correct in both cases. Note, that having two worker roles is also necessary for deployment updates to avoid any glitches.

    Thanks

    Roman




    • Edited by Roman Kiss Wednesday, April 12, 2017 11:38 PM
    Wednesday, April 12, 2017 11:37 PM