Not able to use pip in hdinsight spark cluster to install packages. RRS feed

  • Question

  • I am connecting through putty to spark cluster. when i enter $python it takes me to python shell, that means python environment is set. But i am unable to use pip to install packages.

    Tuesday, December 3, 2019 10:56 AM

All replies

  • Hi Satyamt1997,

    Here is an example how to install python packages in hdinsight cluster.

    I will show you an example to install plotly package using pip.

    If you use “pip install ploty”, throws an error message “ImportError: No module named _internal.main”.

    Try installing using superuser example “sudo pip install ploty”, installs with any error message.


    HDInsight Spark cluster is created with Anaconda installation. There are two Python installations in the cluster, Anaconda Python 2.7 and Python 3.5. The table below shows the default Python settings for Spark, Livy, and Jupyter.

    Go to the python 3.5 path “cd /usr/bin/anaconda/envs/py35/bin” and install python packages using “sudo python -m pip install pyarrow”.

    For more details, refer “Safely manage python environment on Azure HDInsight”.

    Hope this helps.


    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Wednesday, December 4, 2019 4:53 AM
  • Hello,

    Just checking in to see if the above answer helped. If this answers your query, do click “Mark as Answer” and Up-Vote for the same. And, if you have any further query do let us know.

    Thursday, December 5, 2019 9:25 AM
  • Hello,

    Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

    Friday, December 6, 2019 10:07 AM