none
How data is partitioned in Azure Blob Storage?

    Question

  • Azure Blob storage uses WASB which is an abstraction of HDFS. So I wonder to know how data splitting occurs in Blob Storage, i.e. how a big file is partitioned into smaller chunks? what is the chunk size and replication factor. How is the data replication?
    Tuesday, August 4, 2015 4:14 PM

Answers

  • Hi,

    There are two kinds of blobs – block blobs and page blobs.

    When we create the blob we specify the type. The maximum size for a block blob is 200 GB. We can upload block blobs with a size up to 64 MB in one operation. For larger block blobs you can upload them by programmatically splitting them into blocks and using multiple threads to upload the blocks in parallel. When you commit the blocks, Azure puts them back together and makes them available as a file.

    Page blobs can be up to 1 TB in size and consist of a collection of 512-byte pages. You set the maximum size when creating a page blob and then you can write or update specific pages. The primary use I have seen for these is to back the IaaS Virtual Machines in Azure – the Virtual Hard Drives (VHDs) that represent the data disks and OS disks are stored as page blobs in Azure Storage.

    In storage, the partition is basically happens considering the 3 factors :

    • Volume – How much data will you ultimately store, like  A couple / hundred gigabytes / gigabytes / Terabytes / Petabytes ?
    • Velocity – What is the rate at which your data will grow? Is it an internal app that isn’t generating a lot of data or An external app that users will be uploading images and videos into?
    • Variety - What type of data will you store say Relational, images, key-value pairs or social graphs ?

    There are basically three approaches to partitioning considering the above 3 factors :

    •  Vertical partitioning :like splitting up a table by columns: one set of columns goes into one data store, and another set of columns goes into a different data store
    •  Horizontal partitioning : Horizontal portioning is like splitting up a table by rows: one set of rows goes into one data store, and another set of rows goes into a different data store.
    •  Hybrid partitioning :

     

    Uploading large files in azure blob storage, the procedure would be to split the file in chunks (blocks), upload these chunks and then commit those chunks / blocks.

    Data splitting can be done using " Windows Azure SDK for PHP " where in a file can be split into fifty thousand (50000) blocks. Each block must be assigned a unique id (block id).

    All block ids must have the same length and When sending to Windows Azure, each block id must be Base64 encoded.

    Reference : Uploading Large File By Splitting Into Blocks In Windows Azure Blob Storage Using Windows Azure SDK For PHP

     Hope this helps.

    Regards,

    Shirisha Paderu

    Wednesday, August 5, 2015 2:03 PM
    Moderator
  • Hi,

    Data replication factors include

    LRS , ZRS , GRS , RA-GRS .

    ZRS is only available for Block blob. Once you have created your storage account and selected ZRS, you cannot convert it to use to any other type of replication, or vice versa.

    see " Replication for Durability and High Availability " for more details on storage data replication.

    Regards,

    Shirisha Paderu.


    Wednesday, August 5, 2015 2:19 PM
    Moderator

All replies

  • Hi,

    There are two kinds of blobs – block blobs and page blobs.

    When we create the blob we specify the type. The maximum size for a block blob is 200 GB. We can upload block blobs with a size up to 64 MB in one operation. For larger block blobs you can upload them by programmatically splitting them into blocks and using multiple threads to upload the blocks in parallel. When you commit the blocks, Azure puts them back together and makes them available as a file.

    Page blobs can be up to 1 TB in size and consist of a collection of 512-byte pages. You set the maximum size when creating a page blob and then you can write or update specific pages. The primary use I have seen for these is to back the IaaS Virtual Machines in Azure – the Virtual Hard Drives (VHDs) that represent the data disks and OS disks are stored as page blobs in Azure Storage.

    In storage, the partition is basically happens considering the 3 factors :

    • Volume – How much data will you ultimately store, like  A couple / hundred gigabytes / gigabytes / Terabytes / Petabytes ?
    • Velocity – What is the rate at which your data will grow? Is it an internal app that isn’t generating a lot of data or An external app that users will be uploading images and videos into?
    • Variety - What type of data will you store say Relational, images, key-value pairs or social graphs ?

    There are basically three approaches to partitioning considering the above 3 factors :

    •  Vertical partitioning :like splitting up a table by columns: one set of columns goes into one data store, and another set of columns goes into a different data store
    •  Horizontal partitioning : Horizontal portioning is like splitting up a table by rows: one set of rows goes into one data store, and another set of rows goes into a different data store.
    •  Hybrid partitioning :

     

    Uploading large files in azure blob storage, the procedure would be to split the file in chunks (blocks), upload these chunks and then commit those chunks / blocks.

    Data splitting can be done using " Windows Azure SDK for PHP " where in a file can be split into fifty thousand (50000) blocks. Each block must be assigned a unique id (block id).

    All block ids must have the same length and When sending to Windows Azure, each block id must be Base64 encoded.

    Reference : Uploading Large File By Splitting Into Blocks In Windows Azure Blob Storage Using Windows Azure SDK For PHP

     Hope this helps.

    Regards,

    Shirisha Paderu

    Wednesday, August 5, 2015 2:03 PM
    Moderator
  • Hi,

    Data replication factors include

    LRS , ZRS , GRS , RA-GRS .

    ZRS is only available for Block blob. Once you have created your storage account and selected ZRS, you cannot convert it to use to any other type of replication, or vice versa.

    see " Replication for Durability and High Availability " for more details on storage data replication.

    Regards,

    Shirisha Paderu.


    Wednesday, August 5, 2015 2:19 PM
    Moderator