locked
Merge files in Azure Blob using powershell RRS feed

  • Question

  • Hi,

    I have a requirement to Merge multiple files matching with the same keyword having different timestamp in Azure blobs and move them from one folder to another so that downstream service can consume them.

    I was able to move the files from one folder to other by using the below script

    $FileTypeFilter=  "Archive/EuAfMe_CUSTOMER"# This is case sensitive

    $Blobs

    =Get-AzureStorageBlob-Context$Context-Container$ContainerName-Prefix$FileTypeFilter

    #Write-Host "before Landing for-each loop"

    foreach

    ($blobin$Blobs)

    {

        $destFile="Landing/"+$blob.Name.Substring(8)

    Write-Host$destFile   "   "$blob.Name  

    Write-Host"file name = "$blob.Name

    Start-AzureStorageBlobCopy-SrcContainer$ContainerName-DestContainer$ContainerName-Context$Context  -SrcBlob$blob.Name -DestBlob$destFile-Force#Copy file name and directory   

    Remove-AzureStorageBlob-Container$ContainerName-Context$Context-Blob$blob.Name   

    }

    However, I don't find an option to concatenate them using powershell (With in Blob folders) into one single file and then move it to another folder. Is there any way to achieve this specifically using powershell ?

    Note: All files in the folder are text/csv files with same layout.

    I am inclined to powershell as I can easily plug in the scripts in to a third party scheduler used in our project.

    Kindly advise.

    Thursday, January 18, 2018 2:01 PM

Answers

  • Copy Blob operation cannot concatenate/join/combine blobs, it’s intended as a background copy operation and can only make "copies" the destination blob will be overwritten each time

     

    There you may use case it will require the content of all the required blobs to be retrieved, a new blob constructed locally and then uploaded to destination

      

    get-azurestorageblobcontent will bring down the blob content

    Reference: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-powershell#download-blobs  Concatenate Locally  

     

    Upload blobs to the container

    -----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" on the post that helps you, this can be beneficial to other community members.

    Friday, January 19, 2018 2:47 PM
  • Here is the solution

    $getFirstLine = $true

    get-childItem "C:\Users\*.csv" | foreach {
        $filePath = $_

        $lines =  $lines = Get-Content $filePath 
        $linesToWrite = switch($getFirstLine) {
               $true  {$lines}
               $false {$lines | Select -Skip 1}

        }

        $getFirstLine = $false
        Add-Content "C:\Users\MergedFiles\Merge.csv" $linesToWrite
        }

    • Marked as answer by Kiran612 Tuesday, February 20, 2018 11:55 AM
    Tuesday, February 20, 2018 11:55 AM

All replies

  • You can use get-content and add-content command to append the contents of files to another file. In this case if It’s blob...

    Refer here: https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.management/add-content?view=powershell-5.1

    Let me know the outcome!

    -----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" on the post that helps you, this can be beneficial to other community members.

    Thursday, January 18, 2018 5:43 PM
  • @sumanth

    we tried using New-Item(file type) and get-content, these cmdlets expects path should be local, not the blob azure path(i mean https://blob.windows.net/files/files.txt). when we tried this, we got an error saying the there is no drive called(https://blob.windows.net/files/files.txt).

    please help us here.

    Friday, January 19, 2018 6:14 AM
  • Looks like Get- Content only looks for the files available in Local. But my requirement is to merge the files in blob. Could you please advise if I can use "Get-AzureStorageBlobContent"?

    $FileTypeFilter=  ("Archive/product_hier_rdm") # This is case sensitive

    $Blobs

    =Get-AzureStorageBlob-Context$Context-Container$ContainerName-Prefix$FileTypeFilter  #List of all blobs name

    #Write-Host "before Landing for-each loop"


    foreach($blobin$Blobs)

    $destFile="Landing/"+$blob.Name 

    Write-Host"file = "$blob.Name

    Add-Content-Path"Test.txt"-Value(Get-Content$blob.Name) }

    Get-Content : Cannot find path 'C:\Users\nittak\Archive\product_hier_rdm_20180913010005.txt' because it does not exist.

    At line:39 char:40

    +   Add-Content -Path "Test.txt" -Value (Get-Content $blob.Name)

    +                                        ~~~~~~~~~~~~~~~~~~~~~~

        + CategoryInfo          : ObjectNotFound: (C:\Users\nittak...80913010005.txt:String) [Get-Content], ItemNotFoundException

        + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand

     


    • Edited by Kiran612 Friday, January 19, 2018 6:29 AM
    Friday, January 19, 2018 6:25 AM
  • Copy Blob operation cannot concatenate/join/combine blobs, it’s intended as a background copy operation and can only make "copies" the destination blob will be overwritten each time

     

    There you may use case it will require the content of all the required blobs to be retrieved, a new blob constructed locally and then uploaded to destination

      

    get-azurestorageblobcontent will bring down the blob content

    Reference: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-powershell#download-blobs  Concatenate Locally  

     

    Upload blobs to the container

    -----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" on the post that helps you, this can be beneficial to other community members.

    Friday, January 19, 2018 2:47 PM
  • Thanks for your confirmation.
    Monday, January 22, 2018 12:01 PM
  • Hi,

    While performing the above suggested merge operation by using download, Merge and Upload mechanism using PowerShell, I would want to perform the below actions.

    1. Merge should be done considering the oldest file first followed by appending contents of next latest files. (Ascending order -- oldest file first , second oldest next, and so on..)

    2. Remove headers completely while performing concatenation or having the first header in the merged files and rest all rows (except the header) from rest of the files.

    Could you please advise on this.

    Thanks,

    Kiran 


    • Edited by Kiran612 Thursday, February 15, 2018 6:32 AM
    Thursday, February 15, 2018 6:31 AM
  • Here is the solution

    $getFirstLine = $true

    get-childItem "C:\Users\*.csv" | foreach {
        $filePath = $_

        $lines =  $lines = Get-Content $filePath 
        $linesToWrite = switch($getFirstLine) {
               $true  {$lines}
               $false {$lines | Select -Skip 1}

        }

        $getFirstLine = $false
        Add-Content "C:\Users\MergedFiles\Merge.csv" $linesToWrite
        }

    • Marked as answer by Kiran612 Tuesday, February 20, 2018 11:55 AM
    Tuesday, February 20, 2018 11:55 AM