Trouble running example of pyspark in VSCode RRS feed

  • Question

  • Hi,

    I've tried to follow:

    Hive examples worked fine, but I had trouble with pyspark, both PySPark Interactive and Batch throw errors. I'll focus on interactive:

    1st issue I ran into was "cannot import name DataError". After searching around I found out that I have to downgrade pandas 0.20.0, which meant downgrading python to 3.6. Please specify this as requirement.

    Then I've encountered 2nd issue I need help with:

    The code failed because of a fatal error:
        Invalid status code '400' from with error payload: "Invalid kind: pyspark3 (through reference chain: org.apache.livy.server.interactive.CreateInteractiveRequest[\"kind\"])".



    Friday, August 10, 2018 9:46 PM

All replies

  • PySpark3 is not supported anymore in Livy 0.4 (which is HDI spark 2.2 cluster). Only “PySpark” is supported for python. It is known issue that submit to spark 2.2 fail with python3.

    For more details, refer “Use Azure HDInsight Tools for Visual Studio Code”.


    If this answer was helpful, click “Mark as Answer” or “Up-Vote”. To provide additional feedback on your forum experience, click here

    Saturday, August 11, 2018 2:42 AM
  • Just Checking in to see if the above answer helped. If this answers your query, do click “Mark as Answer” and Up-Vote for the same. And, if you have any further query do let us know.
    Tuesday, August 14, 2018 6:12 AM
  • Having the same problem. Even when the python interpreter is python2, HDInsight is still starting pyspark3 kernel and fails, how is this still an issue with no way to specify what kernel to be used ? 
    • Edited by user909090 Tuesday, February 12, 2019 1:12 AM
    Tuesday, February 12, 2019 1:10 AM