locked
Jupyter and pyspark through ssh not working in HDInsights RRS feed

  • Question

  • Hi,

    After I create a HDInsight cluster on azure, I used it fine for tutorial on jupyter notebook at the first time. But when I start a new jupyter notebook and run

    From pyspark import *

    it just run for a few seconds, and shows

    The code failed because of a fatal error:
    	Session 7 did not start up in 180 seconds..
    
    Some things to try:
    a) Make sure Spark has enough available resources for Jupyter to create a Spark context. For instructions on how to assign resources see http://go.microsoft.com/fwlink/?LinkId=717038
    b) Contact your cluster administrator to make sure the Spark magics library is configured correctly.

    After this failure with jupyter notebook, I tried the pyspark through ssh. When I run in Bash with

    $ pyspark

    It shows the starting information

    SPARK_MAJOR_VERSION is set to 2, using Spark2
    Python 2.7.12 |Anaconda custom (64-bit)| (default, Jul  2 2016, 17:42:40)
    [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    Anaconda is brought to you by Continuum Analytics.
    Please check out: http://continuum.io/thanks and https://anaconda.org
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    So I cannot work with the cluster now. Is there any way to fix this problem?

    Monday, April 13, 2020 8:11 AM

Answers

  • Hello,

    As per my observation, you will get this error message when you have issue with “YARN” services example: YARN service is stopped.

    ERROR: First I had stopped “YARN” services.

    Now I started using Jupyter notebook and when I run the same query, experiencing the same error message as yours.

    WALKTHROUGH: ERROR MESSAGE

    SUCCESS: All Ambari services are running without any issue.

    To successfully run “Jupyter Notebook” queries, make sure all the services are running without any issue.

    WALKTHROUGH: SUCCESS MESSAGE

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Friday, April 17, 2020 12:02 PM

All replies

  • Hello,

    This looks like error with Spark application resources. Check resources available on your cluster and close any applications that you don't need. Please see more details here: https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-resource-manager#kill-running-applications

    Meanwhile, I would request you to go through the documentation: Tutorial: Load data and run queries on an Apache Spark cluster in Azure HDInsight.

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Monday, April 13, 2020 11:36 AM
  • Hello,

    This looks like error with Spark application resources. Check resources available on your cluster and close any applications that you don't need. 

    Go to **Yarn UI** => Select the **Application ID** which in running state => **Kill the application**.

    Please see more details here: https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-resource-manager#kill-running-applications

    **Here are the steps to create a Jupyter notebook and run queries on Azure HDInsight Spark cluster:**

    Go to **Azure Portal** => From **Cluster Dashboards** => Select **Jupyter Notebook** => Create **Pyspark** notebook => And execute the queries as shown.



    **You can use interactive Apache for running Pyspark (Python) queries:**



    **Reference:** https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-shell

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Wednesday, April 15, 2020 6:31 AM
  • Hello,

    This looks like error with Spark application resources. Check resources available on your cluster and close any applications that you don't need. 

    Go to **Yarn UI** => Select the **Application ID** which in running state => **Kill the application**.

    Please see more details here: https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-resource-manager#kill-running-applications

    **Here are the steps to create a Jupyter notebook and run queries on Azure HDInsight Spark cluster:**

    Go to **Azure Portal** => From **Cluster Dashboards** => Select **Jupyter Notebook** => Create **Pyspark** notebook => And execute the queries as shown.



    **You can use interactive Apache for running Pyspark (Python) queries:**



    **Reference:** https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-shell

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Thanks for the answer, But when I click yarn on the cluster dash board, or through " Ambari UI > YARN > Quick Links > Active > Resource Manager UI". I got the follow notation:I have retried for several times. But it still not work. Is there any way to solve this?
    Wednesday, April 15, 2020 10:01 AM
  • By the way, if I connect with ssh, and run

    yarn application -list

    in order to list application and kill. Then I got

    20/04/15 10:16:50 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2]
    20/04/15 10:16:50 INFO client.AHSProxy: Connecting to Application History server at headnodehost/10.0.0.61:10200
    20/04/15 10:16:50 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
    It seems that the connection to yarn is failed. Can this also be solved?


    Wednesday, April 15, 2020 10:21 AM
  • Hello,

    I would request you to go to Azure Portal  => Cluster Dashboards => Select YARN as shown.

    And also, try to click on Jupyter notebook and check if you are able to run queries. 

    Hope this helps. Do let us know if you need any help.

    Thursday, April 16, 2020 7:33 AM
  • Hi,

    Thanks for your patience, I followed your instruction, and select "Yarn", And I got the following pages:

    And When I select jupyter-notebook and try to run queries it goes like this

    So I think it is still not working ...

    Thursday, April 16, 2020 7:45 AM
  • Hello,

    As per my observation, you will get this error message when you have issue with “YARN” services example: YARN service is stopped.

    ERROR: First I had stopped “YARN” services.

    Now I started using Jupyter notebook and when I run the same query, experiencing the same error message as yours.

    WALKTHROUGH: ERROR MESSAGE

    SUCCESS: All Ambari services are running without any issue.

    To successfully run “Jupyter Notebook” queries, make sure all the services are running without any issue.

    WALKTHROUGH: SUCCESS MESSAGE

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Friday, April 17, 2020 12:02 PM
  • Thanks! it works now!
    Friday, April 17, 2020 12:58 PM
  • Hello,

    Glad to know that your issue has resolved. 
    Monday, April 20, 2020 7:20 AM