locked
Spark/Scala job which run successfully on HDInsight cluster submitted via IntelliJ, fails when the JAR file is executed on ADF pipeline RRS feed

  • Question

  • I get the following result after executing a simple Spark/Scala/Java code with Maven, which validates an input file:


    INFO: ========== RESULT ==========
    INFO: Job run successfully.

    I get all the results as I had expected. But when I try to execute this via ADF pipeline by uploading the resultant jar file from IntelliJ, it fails with the following error:

    stdout: 
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.3005-23/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.3005-23/spark_llap/spark-llap-assembly-1.0.0.2.6.5.3005-23.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
    18/12/12 12:25:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Exception in thread "main" java.lang.IllegalArgumentException: Null user
    at org.apache.hadoop.security.UserGroupInformation.createProxyUser(UserGroupInformation.java:1484)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:175)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    18/12/12 12:25:15 INFO ShutdownHookManager: Shutdown hook called
    18/12/12 12:25:15 INFO ShutdownHookManager: Deleting directory /tmp/spark-49854c36-671e-4aa7-8aab-9a87123fdd05

    stderr: 

    YARN Diagnostics: 
    java.lang.Exception: No YARN application is found with tag livy-batch-22-g2roxlsd in 120 seconds. Please check your cluster status, it is may be very busy.
    org.apache.livy.utils.SparkYarnApp.org$apache$livy$utils$SparkYarnApp$$getAppIdFromTag(SparkYarnApp.scala:182) org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:239) org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:236) scala.Option.getOrElse(Option.scala:120) org.apache.livy.utils.SparkYarnApp$$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:236) org.apache.livy.Utils$$anon$1.run(Utils.scala:97)

    I have added the following dependencies to my pom.xml:

    ..
    ..
    <properties>
    <spark.version>2.3.0</spark.version>
    <scala.version.major>2.11</scala.version.major>
    <scala.version.minor>8</scala.version.minor>
    <scala.version>${scala.version.major}.${scala.version.minor}</scala.version>
    <slf4j.version>1.7.16</slf4j.version>
    <hadoop.deps.scope>compile</hadoop.deps.scope>
    </properties>
    ..
    ..

    ..
    ..

    <dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-log4j12</artifactId>
    <version>${slf4j.version}</version>
    <scope>${hadoop.deps.scope}</scope>
    <exclusions>
    <exclusion>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-log4j12-1.7.10</artifactId>
    </exclusion>
    </exclusions>
    </dependency>
    ..
    ..

    There is no issue when I execute my code 1 line at a time via Spark Shell. I do get 'multiple SLF4J bindings' warning but that is not a fatal error and thus I get "Job run successfully" message on executing the code in IntelliJ. Don't understand why it fails when run on ADF.

    Kindly help.


    Friday, December 14, 2018 1:18 AM

All replies

  • Hello,

    Could you take a look at the Livy server log for exceptions?

    I've seen references to the error you provided being related to user permission issues, which would be confirmed in the Livy server logs.
    Friday, December 14, 2018 6:26 PM