none
How to add dependencies and application.conf in Azure data factory - Spark activity

    Question

  • I am trying to run this scala code in ADF but I faced an error. I could successfully run it using spark-submit. The code does have dependency and application.conf(including database url). I use this command to run the jar file using spark-submit :

    spark-submit  --packages com.pygmalios:reactiveinflux-spark_2.10:1.4.0.10.0.5.1,com.typesafe.netty:netty-http-pipelining:1.1.4  --jars /home/sshuser/reactiveinflux-spark_2.10-1.4.0.10.0.5.1.jar sapn_2.11-1.0.jar

    I also placed the application.conf in this path :  /home/sshuser/application.conf 

    So far, everything is good and I can run the spark application in Azure Hdinsight using spark-submit command. But, when I use Spark activity in Azure Data factory, it failed. So does anyone have any idea how to add packages and dependencies and application.conf when submitting the job with ADF? 

    Wednesday, June 27, 2018 6:15 AM

All replies

  • Hello,

    Could you share the JSON for your Spark activity?  

    Also, what is the response or error you are seeing for the activity?

    Wednesday, June 27, 2018 10:50 PM
    Moderator
  • This is the error I got :

    { "errorCode": "2312", "message": "Spark job failed, batch id:0", "failureType": "UserError", "target": "Spark1" }

    and this is the json file :

    "name": "pipeline1",
    "properties": {
    "activities": [
    {
    "name": "Spark1",
    "type": "HDInsightSpark",
    "policy": {
    "timeout": "7.00:00:00",
    "retry": 0,
    "retryIntervalInSeconds": 30,
    "secureOutput": false
    },
    "typeProperties": {
    "rootPath": "quickstartblobs/spark",
    "entryFilePath": "sapn_2.11-1.0.jar",
    "sparkConfig": {
    "spark.jars.packages": "com.pygmalios:reactiveinflux-spark_2.10:1.4.0.10.0.5.1,com.typesafe.netty:netty-http-pipelining:1.1.4"
    },
    "sparkJobLinkedService": {
    "referenceName": "AzureStorage3",
    "type": "LinkedServiceReference"
    }
    },
    "linkedServiceName": {
    "referenceName": "linkedService2",
    "type": "LinkedServiceReference"
    }
    }
    ]
    },
    "type": "Microsoft.DataFactory/factories/pipelines"

    }

    Please let me if the dependencies are written correctly and how to add application.conf which includes the database url and credentials

    Thursday, June 28, 2018 12:01 AM