none
What is the purpose of having two folders in Azure Data-lake Analytics?

    Question

  • Hi All,

    I am a newbie to Azure Data lake. please excuse if I messed up.

    Azure Data lake analytics: Why do we have 2 folders in ADLA(Catalog and system)? What is the purpose of these folders ?

    Azure Data lake Storage: What is the purpose of Database type storage? Is this like a staging environment? I see that we can also do transformation using U-SQL and push code. This can also be done in Data lake factory,why is the use to do transformations here.


    Monday, February 18, 2019 7:55 PM

Answers

  • Hi Kommu123,

    Welcome to Azure!

    Azure Data lake analytics: Why do we have 2 folders in ADLA(Catalog and system)? What is the purpose of these folders?

    What is the Catalog folder?

    Every Data Lake Analytics account has a catalog associated with it, which is used to store data and code. You can think of it as a collection of objects and data. The catalog is always present and cannot be deleted (in Azure, anyway; you can delete the file on your local machine, but it will be recreated from scratch the next time you open a U-SQL project). The principle aims of the catalog are to enable code sharing and enhance performance. It keeps a record of all of your databases and database elements, and comes with the master database built-in, which also cannot be deleted. The catalog stores databases, tables, table-valued functions (TVFs), schemas, assemblies and all other code-related items.

    What is the System folder?

    In Azure Data Lake Analytics, you can use multiple user accounts or service principals to run jobs.

    In order for those same users to see the detailed job information, the users need to be able to read the contents of the job folders. The job folders are located in /system/ directory.

    System folder looks like below after submission of jobs:

    • /system
    • /system/jobservice
    • /system/jobservice/jobs
    • /system/jobservice/jobs/Usql
    • /system/jobservice/jobs/Usql/2018
    • /system/jobservice/jobs/Usql/2018/05
    • /system/jobservice/jobs/Usql/2018/05/25
    • /system/jobservice/jobs/Usql/2018/05/25/11
    • /system/jobservice/jobs/Usql/2018/05/25/11/01
    • /system/jobservice/jobs/Usql/2018/05/25/11/01/b074bd7a-1448-d879-9d75-f562b101bd3d

    Azure Data lake Storage: What is the purpose of Database type storage? Is this like a staging environment? I see that we can also do transformation using U-SQL and push code. This can also be done in Data lake factory, why is the use to do transformations here.

    The principle aims of the catalog are to enable code sharing and enhance performance. It keeps a record of all of your databases and database elements, and comes with the master database built-in, which also cannot be deleted. The catalog stores databases, tables, table-valued functions (TVFs), schemas, assemblies and all other code-related items.

    In Azure, the catalog comes pre-populated with the master database. There’s a bunch of assemblies already installed too (system stuff like Python and R libraries – all the stuff that makes Data Lakes cool). This is a bit different from what you can see locally – you can’t see any centralised files managing your Data Lake, for instance. There isn’t any need for such a mechanism here – everything is spread across the Data Lake installation in Azure. All you can see here is the database folder. When you create objects and databases, they appear within this folder just like they do locally. The database folder contains one folder (you’ll see it with a GUID as the name), this is the folder that represents the master database.

    Yes, you can transform using U-sql scripts and you can transform data by running U-SQL scripts on Azure Data Lake Analytics using Azure Data Factory.

    For more details, refer the links below:

    Get started with Azure Data Lake Analytics using the Azure portal

    U-SQL programmability guide

    Transform data by running U-SQL scripts on Azure Data Lake Analytics

    Hope this helps.

    Tuesday, February 19, 2019 5:16 AM
    Moderator

All replies

  • Hi Kommu123,

    Welcome to Azure!

    Azure Data lake analytics: Why do we have 2 folders in ADLA(Catalog and system)? What is the purpose of these folders?

    What is the Catalog folder?

    Every Data Lake Analytics account has a catalog associated with it, which is used to store data and code. You can think of it as a collection of objects and data. The catalog is always present and cannot be deleted (in Azure, anyway; you can delete the file on your local machine, but it will be recreated from scratch the next time you open a U-SQL project). The principle aims of the catalog are to enable code sharing and enhance performance. It keeps a record of all of your databases and database elements, and comes with the master database built-in, which also cannot be deleted. The catalog stores databases, tables, table-valued functions (TVFs), schemas, assemblies and all other code-related items.

    What is the System folder?

    In Azure Data Lake Analytics, you can use multiple user accounts or service principals to run jobs.

    In order for those same users to see the detailed job information, the users need to be able to read the contents of the job folders. The job folders are located in /system/ directory.

    System folder looks like below after submission of jobs:

    • /system
    • /system/jobservice
    • /system/jobservice/jobs
    • /system/jobservice/jobs/Usql
    • /system/jobservice/jobs/Usql/2018
    • /system/jobservice/jobs/Usql/2018/05
    • /system/jobservice/jobs/Usql/2018/05/25
    • /system/jobservice/jobs/Usql/2018/05/25/11
    • /system/jobservice/jobs/Usql/2018/05/25/11/01
    • /system/jobservice/jobs/Usql/2018/05/25/11/01/b074bd7a-1448-d879-9d75-f562b101bd3d

    Azure Data lake Storage: What is the purpose of Database type storage? Is this like a staging environment? I see that we can also do transformation using U-SQL and push code. This can also be done in Data lake factory, why is the use to do transformations here.

    The principle aims of the catalog are to enable code sharing and enhance performance. It keeps a record of all of your databases and database elements, and comes with the master database built-in, which also cannot be deleted. The catalog stores databases, tables, table-valued functions (TVFs), schemas, assemblies and all other code-related items.

    In Azure, the catalog comes pre-populated with the master database. There’s a bunch of assemblies already installed too (system stuff like Python and R libraries – all the stuff that makes Data Lakes cool). This is a bit different from what you can see locally – you can’t see any centralised files managing your Data Lake, for instance. There isn’t any need for such a mechanism here – everything is spread across the Data Lake installation in Azure. All you can see here is the database folder. When you create objects and databases, they appear within this folder just like they do locally. The database folder contains one folder (you’ll see it with a GUID as the name), this is the folder that represents the master database.

    Yes, you can transform using U-sql scripts and you can transform data by running U-SQL scripts on Azure Data Lake Analytics using Azure Data Factory.

    For more details, refer the links below:

    Get started with Azure Data Lake Analytics using the Azure portal

    U-SQL programmability guide

    Transform data by running U-SQL scripts on Azure Data Lake Analytics

    Hope this helps.

    Tuesday, February 19, 2019 5:16 AM
    Moderator
  • Hi Kommu123,

    Just checking in to see if the above answer helped. If this answers your query, do click “Mark as Answer” and Up-Vote for the same. And, if you have any further query do let us know.

    Thursday, February 21, 2019 9:34 AM
    Moderator