locked
Java with Data Factory HDInsight Spark Activity RRS feed

  • Question

  • I was wondering if Java jar files in addition to Scala jar files are supported for HDInsight Spark Activities in Data Factory. 


    Monday, June 24, 2019 8:12 PM

Answers

  • Hello,

    Thank you for reaching out.

    We have enabled for one-time free technical support.

    I hope you have received one on one support to work towards a resolution on this matter.

    • Marked as answer by bwong9 Friday, June 28, 2019 8:55 PM
    Friday, June 28, 2019 5:18 AM

All replies

  • Hello,

    For Spark jobs, you can provide multiple dependencies such as jar packages (placed in the java CLASSPATH), python files (placed on the PYTHONPATH), and any other files.

    Create the following folder structure in the Azure Blob storage referenced by the HDInsight linked service. Then, upload dependent files to the appropriate sub folders in the root folder represented by entryFilePath.

    For example, upload python files to the pyFiles subfolder and jar files to the jars subfolder of the root folder. At runtime, Data Factory service expects the following folder structure in the Azure Blob storage:

    For more details, refer “Transform data using Spark activity in ADF”.

    Hope this helps.

    Tuesday, June 25, 2019 5:29 AM
  • Thank you for the information unfortunately I am still unable to get a simple pipeline with Java to work. The steps of my process are the following: 

    First I build a (working) Java project in IntelliJ and have it produce a .jar file as an artifact. It is a simple program that counts the frequency of words in a file. 


    Then I upload that .jar file to the blob below:

    After connecting that blob to my Data Factory and creating an On-Demand HDInsight Link Service I specify the file path to the root folder in the Spark Activity of my pipeline.

    Notes:

    • HDInsight Version 3.6
    • Spark Version: 2.3.0

    Current Error: 

    Activity Spark1 failed: Internal server error occurred while processing the request. Please retry the request or contact support.

    • no logs in the log folder are obtained

    Tuesday, June 25, 2019 7:08 PM
  • Update: I tried deploying just an Instance of HDInsight and it was unable to deploy. The error for the failed deployment was the same. Below is the raw error:

    {
      "code": "DeploymentFailed",
      "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.",
      "details": [
        {
          "code": "Conflict",
          "message": "{Internal server error occurred while processing the request. Please retry the request or contact support.}"
        }
      ]
    }

    Tuesday, June 25, 2019 11:10 PM
  • Hello,

    Kindly check the “activity log”, which shows the reason for validation failure.

    Could you try now and confirm if you’re still facing the issue? 

    Wednesday, June 26, 2019 6:38 AM
  • I just tried deploying an HDInsight Instance again and it failed after 1 hour 6 minutes 46 seconds.

    I have attached a screen shot of the JSON activity 'Write Cluster' that failed. 

    Wednesday, June 26, 2019 4:52 PM
  • Hello,

    This issue looks strange. For a deeper investigation and immediate assistance on this issue, if you have a support plan you may file a support ticket, else could you send an email to AzCommunity@Microsoft.com with your Subscription ID and thread link to this post, and I will enable a one-time free support request for your subscription. 

    Please reference this forum thread in the subject: “Java with Data Factory HDInsight Spark Activit”. Thank you for your persistence.

    Thursday, June 27, 2019 5:11 AM
  • Thank you, sending an email now. 
    Thursday, June 27, 2019 10:33 PM
  • Hello,

    Thank you for reaching out.

    We have enabled for one-time free technical support.

    I hope you have received one on one support to work towards a resolution on this matter.

    • Marked as answer by bwong9 Friday, June 28, 2019 8:55 PM
    Friday, June 28, 2019 5:18 AM