none
Spark + Tensorflow in GPUs using HDInsight

    Question

  • Hi, I'm new to Azure, and I would like to implement something like this: https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html Using Spark + Tensorflow in GPUs together to train different deep learning models. I see that GPU VMs are available in Azure, as well as a ready Spark solution with HDInsight but it seems that it is not available for GPU machines. Would you advise to install Spark and Tensorflow on GPUs VMs instead of using HDInsight, or maybe there is a better way?

    Thanks for your attention, really looking for pointers here since I'm just learning about Cloud Computing.
    Tuesday, March 21, 2017 4:58 AM

Answers

  • Currently, Azure HDInsight does not support GPU VMs.

    Azure HDInsight is available on more virtual machine types and sizes. HDInsight can utilize the following:

    ·         A2 to A7 sizes that are built for general purposes

    ·         D-Series nodes that feature solid-state drives (SSDs) and 60-percent faster processors

    ·         A8 and A9 sizes that have InfiniBand support for fast networking

    For more information, please visit HDInsight pricing.

    I would recommend you create Apache Spark cluster in Azure HDInsight and install the TesnorFlow package using script action on your cluster and use it via the Jupyter notebook.

    For more information, Create Apache Spark clusters in Azure HDInsight.

    Use Script Action to install external Python packages for Jupyter notebooks in Apache Spark clusters on HDInsight.

    Tuesday, March 21, 2017 5:40 PM
    Moderator
  • Hi Romeo,

    Please follow Pradeep's answer if you are ok running deep learning without GPUs. If not then at the moment the recommendation would be to use regular GPU VMs. Since the post you reference was published TensorFlow now provides native support for distributed computation on a cluster of GPU VMs so you won't need to install Spark. TensorFlow native capabilities will be sufficient for deep learning. To prepare data for deep learning you can use HDInsight Spark cluster and store dataset on Azure Blob. Then load it from GPU cluster in TensorFlow.

    Best,
    Maxim

    • Marked as answer by Romeo Cabrera Wednesday, March 22, 2017 2:39 AM
    Tuesday, March 21, 2017 10:21 PM
    Moderator

All replies

  • Currently, Azure HDInsight does not support GPU VMs.

    Azure HDInsight is available on more virtual machine types and sizes. HDInsight can utilize the following:

    ·         A2 to A7 sizes that are built for general purposes

    ·         D-Series nodes that feature solid-state drives (SSDs) and 60-percent faster processors

    ·         A8 and A9 sizes that have InfiniBand support for fast networking

    For more information, please visit HDInsight pricing.

    I would recommend you create Apache Spark cluster in Azure HDInsight and install the TesnorFlow package using script action on your cluster and use it via the Jupyter notebook.

    For more information, Create Apache Spark clusters in Azure HDInsight.

    Use Script Action to install external Python packages for Jupyter notebooks in Apache Spark clusters on HDInsight.

    Tuesday, March 21, 2017 5:40 PM
    Moderator
  • Hi Romeo,

    Please follow Pradeep's answer if you are ok running deep learning without GPUs. If not then at the moment the recommendation would be to use regular GPU VMs. Since the post you reference was published TensorFlow now provides native support for distributed computation on a cluster of GPU VMs so you won't need to install Spark. TensorFlow native capabilities will be sufficient for deep learning. To prepare data for deep learning you can use HDInsight Spark cluster and store dataset on Azure Blob. Then load it from GPU cluster in TensorFlow.

    Best,
    Maxim

    • Marked as answer by Romeo Cabrera Wednesday, March 22, 2017 2:39 AM
    Tuesday, March 21, 2017 10:21 PM
    Moderator
  • Hello, 

    When will GPU's be added part of HDInsight spark cluster. If we run a separate instance of GPU and call it from Spark cluster we are wondering about the HA of the GPU instance.

    Wednesday, November 01, 2017 4:37 PM
  • Currently, Azure HDInsight does not support GPU VMs.

    You may leave you feedback here:

    https://feedback.azure.com/forums/217335-hdinsight

    All of the feedback you share in these forums will be monitored and reviewed by the Microsoft engineering teams responsible for building Azure.

    -----------------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and “Vote as Helpful” on the post that helps you, this can be beneficial to other community members.

    Thursday, November 02, 2017 1:42 PM
    Moderator
  • Private preview of the GPU support on HDInsight will be available soon. If you are interested in giving it a try please reach out to me: maxluk in microsoft email domain. Keep in mind that during private preview there will be limited support from the product team and it is not recommended for production use.
    Thursday, November 02, 2017 5:39 PM
    Moderator