When is the best time to transfer data to blob storage? RRS feed

  • Question

  • Hey all,

    I am currently using Azure Batch to parse a large CSV file and break it down to smaller files. The input file is currently stored in Azure Blob Storage and the smaller output files will also need to be stored in blob storage.

    I have a console app that loads chunks of the CSV in to memory and when it hits a certain threshold, it will write to an output location.

    Will it be faster to write to blob storage during the parse...ex:


    Or write to the node shared directory (%AZ_BATCH_NODE_SHARED_DIR%) and then transfer the data to the blob storage after the parse is complete?

    I know this will depend a bit on the type of VM I select, but currently time is more important than cost (to an extent, haha), so I am able to use a fairly well equipped VM.


    Thursday, January 9, 2020 3:11 PM

All replies

  • HI,

    While parsing i think you will have a line of data at a time.

    It will be good to write that to the local file and once the file is written, Then you can push that whole file to blob at once.

    Stream writing is helpful for logs where you will get logs continuously.  Here you have the whole data available for a file.  So writing at a time will be better than streaming it.  This also depends on the size of the smaller output file.

    Thursday, January 23, 2020 10:52 AM
  • Hello,

    Any update on the issue?

    If the suggested response helped you resolve your issue, do click on "Mark as Answer" and "Up-Vote" for the answer that helped you for benefit of the community.

    Tuesday, February 4, 2020 6:14 AM