locked
How to schedule and run U-SQL job for Azure Data Lake Store and Analytics RRS feed

  • Question

  • I would like to read all CSV files in Azure Data Lake Store folder and combine all rows and store to output CSV file. I have working U-SQL job available for doing this. I would need scheduled task but currently scheduling is not supported in Azure Data Lake Analytics. This is initially with small data, but looking solution for larger amount data.

    Is it possible to ADF to run query and use data lakes store as input and output? Can we do using U-SQL instead of HDInsight?


    Kenny_I

    Thursday, December 15, 2016 2:34 PM

Answers

  • Yes, you can! Data Factory supports Data Lake Analytics activites. 

    The first thing to do is define Data Lake Store and Data Lake Analytics as linked services for your ADF.

    Next you define input and output datasets. These are not strictly linked to all the files in your Data Lake Store folder, but it is useful to define this for traceability. 

    The U-SQL job script must be uploaded to a Blob Storage account, and this account be a linked service for your ADF. 

    Finally you define a Data Factory pipeline with the input and output datasets, the script location, and here you also set parameters as parallelism, which makes it possible for you to scale the job when you have more data.

    This is described with examples here: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity 

    • Marked as answer by Kenny_I Friday, December 16, 2016 6:56 AM
    Thursday, December 15, 2016 7:27 PM

All replies

  • Yes, you can! Data Factory supports Data Lake Analytics activites. 

    The first thing to do is define Data Lake Store and Data Lake Analytics as linked services for your ADF.

    Next you define input and output datasets. These are not strictly linked to all the files in your Data Lake Store folder, but it is useful to define this for traceability. 

    The U-SQL job script must be uploaded to a Blob Storage account, and this account be a linked service for your ADF. 

    Finally you define a Data Factory pipeline with the input and output datasets, the script location, and here you also set parameters as parallelism, which makes it possible for you to scale the job when you have more data.

    This is described with examples here: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity 

    • Marked as answer by Kenny_I Friday, December 16, 2016 6:56 AM
    Thursday, December 15, 2016 7:27 PM
  • According to comments made recently (Jan 20, 2017) on other articles relating to this same question (https://docs.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity#script-definition), a U-SQL script cannot be submitted through Data Factory if it necessitates a CS code-behind. My question is: how do I schedule or trigger a U-SQL job "correctly" if it requires a code-behind?
    Tuesday, February 21, 2017 8:27 PM
  • Yes, you can. If you have a code behind and run the script from Visual Studio, then afterwards you can find the script in the job browser (in the azure portal or via visual studio) a code that represents your code behind. Example here: https://raw.githubusercontent.com/Azure/AzureDataLake/master/docs/img/Blogs/RegisterAssembly/Fig2-CodebehindScript.jpg What I usually do is download the entire script with code behind assembly code and put this in the blob container that Data Factory use. More I this blog post; https://blogs.msdn.microsoft.com/azuredatalake/2016/08/26/how-to-register-u-sql-assemblies-in-your-u-sql-catalog/#s1
    Tuesday, February 21, 2017 8:36 PM
  • Hello Helge,

    As you said doing the Codebehindscript for u-sql can be done as simple as downloading the script and uploading the entire script to a blob container and refer this blob container inside the Activities of a pipeline in ADF.?
    Can't we create a additional Batch account Service/Pools to run these Assemblies in the ADF separately in order to use our assemblies ?

    Also Could you please explain how to use a Registered Assemblies (U-SQL Script) in ADF. I couldn't get any reference that how to perform a registered assemblies in our ADF 


    JAYENDRAN

    Sunday, May 21, 2017 5:32 AM