none
Spark Activity Issues

    Question

  • Hi,

    I am trying to create pipeline using SPARK activity by providing python script file location in it, the validation of activity success in azure data factory V2. But I am getting error like “Spark job failed, batch id” while debug/trigger the pipeline.

    Note : I am trying execute same python scripts in Jupyter/Zeppelin notebooks using Azure spark HD insight cluster, it executed successfully as per expected.<o:p></o:p>

    <o:p></o:p>

    Wednesday, April 18, 2018 8:07 AM

All replies

  • The error message reported error with spark batch Id, which means the job was submitted to the cluster successfully by ADF, but the job run failed. Usually there might be following possible issues:

    • The spark program has bugs, it can’t be run in the cluster.
    • The cluster is in bad state. 

    We can find out the real cause (one of the two above) by checking the spark job logs, please follow the steps below to do so:

    • Enable debug info.

    We enable debug info by setting “getDebugInfo” to “Always” or “Failure” in the spark activity. Please refer to document https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-spark#spark-activity-properties.

    • Run the activity again and check both livy & yarn log of the spark activity under the log directory.

    The log directory is in the storage linked to the HDI cluster linked service, and we can find the path by checking the “logLocation” property of the activity output.

    Thursday, April 19, 2018 2:43 AM
  • Thanks for reply Shawn X

    But I did not find "getDebugInfo" option in UI. Can you please do needful?

    Thursday, April 19, 2018 8:54 AM
  • Hi Shawn X,

    I tried by enabling by setting “getDebugInfo” to “Always” or “Failure” in the spark activity neither it generated logs in https://<mycluster>.azurehdinsight.net/yarnui/hn/cluster/apps nor in loglocation.

    Still I am facing same issue by stating below error

    Error in Activity: Spark job failed. BatchId=12. Please find the log in the storage if GetDebugInfo is set to 'Always' or 'Failure'.

    Please do needful to resolve this issue?

    Thursday, April 19, 2018 9:52 AM
  • The "getDebugInfo" property should be under the "typeProperties" of the activity. You can enable this in the UX as well, but unfortunately I can't past a picture here to show it. Just follow the steps:

    • Select the spark activity
    • Go to the Script/Jar tab
    • Click the Advanced section, you will see the "Debug Information".
    Friday, April 20, 2018 11:16 AM