none
Does Azure Data Lake ,Store Data for Analytics on temporarily purpose only?

    Question

  • I'm going through a course in MVA "Introducing Azure Data Lake", and till module 2 ,in each vid it's highlighting, that we are paying only for time job is taking to execute.

    This get me confused, if its has tendency or meant for storing the data in it permanently of our all transaction history data or it's meant for just analysing the portion of transaction temporarly in it and after doing computation on it with any analytical language i.e. USQL, HIVE ,PIG and drop the temporary analytical source once its done.

    In other words:

    Is it creating a temporary or staging table & analysing data on it and dropping the table after outcome of stats .

    Or

    Data lake has tendency or meant to store data for future use like warehouse and do analytics on it as per business requirement and doesn't cost for that storage that much historical that but only for the processing time for analysing on it.

    Regards Harsimran


    HS



    Friday, April 14, 2017 10:28 AM

Answers

  • Azure Data Lake is definitely enables users to store data for the long-term and to do multiple iterations or layers of analytics on that data. You should keep in mind that storage and analytics (computation) are separated in ADL.

    With ADLS (storage) you can land any size and type of data and it will stay there in multi-replica, reliable storage indefinitely. You will be charged for this data based on size and length of storage.

    With data in ADLS (or even blob store) you can spin up different kinds of analytics tasks. If you use ADLA and the U-SQL language, you will submit a query (job) that will read and write data from ADLS. You will be charged for the computation resources needed for that, but only the amount reserved for you during the query execution. You do not have to pay for idle resources while jobs are not executing.

    Saturday, April 15, 2017 2:49 AM

All replies

  • Azure Data Lake is definitely enables users to store data for the long-term and to do multiple iterations or layers of analytics on that data. You should keep in mind that storage and analytics (computation) are separated in ADL.

    With ADLS (storage) you can land any size and type of data and it will stay there in multi-replica, reliable storage indefinitely. You will be charged for this data based on size and length of storage.

    With data in ADLS (or even blob store) you can spin up different kinds of analytics tasks. If you use ADLA and the U-SQL language, you will submit a query (job) that will read and write data from ADLS. You will be charged for the computation resources needed for that, but only the amount reserved for you during the query execution. You do not have to pay for idle resources while jobs are not executing.

    Saturday, April 15, 2017 2:49 AM
  • @Omid , really appreciate your light on doubt , which come to have Query of designing in ADL for performance and all.

    Sharing your experience in same with give me one step ahead towards my goal.

    Regards

    Harsimran 

     

    HS

    Saturday, April 15, 2017 2:49 PM