Architecture related Question on Azure Machine Learning Service RRS feed

  • Question

  • I have a requirement where my data files can be in data lake or any database. I want to use my local machine as Computer engine and train the model on the data present in data lake. As an output it should have a Y variable with the predicted probability using logistic regression.

    This output along with the input data has to be presented to Power BI so that we can do reporting on top of the model output.

    Now in order to do that, here is my approach.

    Store the data in Data lake. Every day I can dump the latest file.

    Connect my local machine to Data Lake to read the file.

    Train the model on my local machine.

    Deploy/Register the model on Azure from my local Jupyter notebook.

    Then connect the output from Power BI.

    Is the approach correct ?

    Another question is, how do I productionise this model so that I stop using local machine in Prod. So only for development phase I want to use local, and for Prod it can use compute engine of Azure.

    Can I use Azure compute engine for free or it is chargeable?

    Any help in architecture diagram, blog, video would be really appreciated.



    Tuesday, August 20, 2019 6:22 AM

All replies

  • Hello Akash,

    For development purposes you can use any locally trained model and deploy it to Azure machine learning service as a web service. This web service can be consumed through power BI based on the methods defined for your web service/model. 

    To get started here are the steps you can try out to deploy a model to Azure using free compute but this is limited to a few hours. You may need to update the plan if you exceed the limits.

    • Register the model.
    • Prepare to deploy (specify assets, usage, compute target).
    • Deploy the model to the compute target.
    • Test the deployed model, also called web service.

    The model can also be deployed as a local web service on your computer where it uses a working docker installation on your complete to use the local web service. This might be ideal for your development work.

    For production you can use the MLOps pipelines to automate your model management. 

    If you found this post helpful, please give it a "Helpful" vote. 
    Please remember to mark the replies as answers if they help. 

    Wednesday, August 21, 2019 6:58 AM
  • Hi,

    Your approach is fine if you have less data or you are not using deep learning algorithm as in these cases your model will require more compute power then you have to explore other options available in the cloud. 


    Thursday, August 22, 2019 6:42 AM
  • @Amit-Tomar

    I agree with you. Currently I am just exploring the components from architecture point of view.

    If you have any such blog/video which can help me setup the infrastructure for Prod that will really help.

    I have gone through some of the microsoft blogs but not getting any document which shows end to end flow.

    Tuesday, August 27, 2019 9:04 AM
  • @Rohit.

    Would you be able to share some architecture diagram which shows end to end flow and then I can try to do a poc on that.

    Tuesday, August 27, 2019 9:08 AM
  • Hi,

    For Prod you can use

    1. Azure stream analytics (streaming data)
    2. Azure Data Bricks OR HDInsight spark cluster (streaming & batch data)
    3. Sql Server with R,spark, python integration (if your data is in sql server database) (batch data).

    Here we generally uses 1 for streaming data so you can implement your machine learning model on your streaming data but as you have data files in data lake so with HDInsight spark cluster you read data directly from Data lake and after connecting it spark has inbuilt machine learning libraries in case 3 first you need to move data from data lake into sql server database and then you can build your machine learning models. So based on your case i would recommend you approach 2.

    Complete Flow diagram with Azure stream analytics:

    you can read more on https://docs.microsoft.com/en-in/azure/stream-analytics/stream-analytics-introduction

    For HDInsight and spark: https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-ipython-notebook-machine-learning


    Hope it helps.


    • Edited by Amit-Tomar Wednesday, August 28, 2019 7:44 AM
    Tuesday, August 27, 2019 9:53 AM