none
SSIS Foreach Loop to copy, rename files within folders

    Question

  • This is where I stand, with the task at hand. I have some 1,000 files within sub folders.  The layout is such

    C:\Source Dir\Folder 1\a.b.c.pdf
    C:\Source Dir\Folder 1\a.b.c.txt
    C:\Source Dir\Folder 1\a.b.c.doc

    C:\Source Dir\Folder  2\xx.yy.zz.pdf
    C:\Source Dir\Folder  2\xx.yy.zz.txt
    C:\Source Dir\Folder  2\xx.yy.zz.doc

    ..... all the way up to 343 sub folders in the Source Directory


    How can I use the ForeachLoop Container and Filesystem Task get the output as mentioned below


    C:\Dest Dir\Folder 1\abc.pdf


    C:\Dest Dir\Folder 2\xxyyzz.pdf

    The SSIS package takes only the PDF files, copies them to c:\dest dir with their respective sub folder names which come from the source subfolder. Then it renames them as their original filename BUT without the "." that exists in the filename, obviously except the "." for the extension (.pdf)

    Thanks


    Wednesday, July 30, 2008 2:28 AM

Answers

  •  new2all wrote:


    If I understand you correctly, do you mean it will be essential for the package to get the names of the subfolders into a variable via an expression? If you meant that, would it be because of the sub folder names and lengths being different.?


    To an extent.  This is actually more to make sure that you are getting the same file structure.  For example if you have a temp folder and it does not have the requisite folders underneath to copy your files into, how will it be forced to do so?  You will essentially have to take this information and use it to create the corresponding folders to get the file you want to copy to the correct location.


     new2all wrote:


    In that case would the logic be such, The foreachloop would not only need to get the loop over the next filename but also the corresponding sub folder name?



    Yes, but if you are using fully qualified names you will get this inf
    ormation.


     new2all wrote:


    I did make a new user variable named "mySubFolder" to just get the sub folder names

    SUBSTRING( @[User::myRootFolder], 16, 90)

    As per your code snippets and instructions I did the required, It works perfect, but only for the sub folder whose values are mentioned (e.g. for myRootFolder variable I have a value C:\Test\Source\SAS Feb 90\) as you mentioned
    \\myserver.mydomain.com\myfolder\mysubfolder\

    Seeing that it works for only one folder I went ahead and changed the "Foreachloop" --> "Collection" --> "Expression" from "myRootFolder" to "mySubFolder" but no luck, only the "Foreachloop Container" turns green without affecting the "FileSystem Task"

    Would you suggest any other logic? Or correct me where I am going wrong?
     


    First off, you are going to want to take out the hard coded 16 and 90 you have on the substring.  But yes, you have run into the brick wall here.  I think what will need to happen will be that you will want to split your "subfolder" string by delimiter (i.e. "\") so that you can find each level of folder between the root and the file location.  After this you will need to create the corresponding file structure so that when you go to copy the files it will have a place to land.  This may require another for each loop inside of the for each loop... Or you might just do it via script task... not sure which would be easiest....

    I'm also not 100% positive that this is necessary.  Anyone else have any thoughts?
    Thursday, July 31, 2008 8:40 PM

All replies

  • you ought to use the foreach loop container to iterate over all the subfolders in source root folder.  the foreach loop file enumerator automatically iterates over subfolders.

     

    our own rafael salas wrote a pretty good tutorial which should help get you started: http://rafael-salas.blogspot.com/2007/03/ssis-file-system-task-move-and-rename.html

     

    hth

    Wednesday, July 30, 2008 6:17 AM
  • Thanks Duane for the response, but i believe i need some more help. I am using the example provided  by Rafael as a working copy for my project.

    The article by Rafael did surely help me get off the ground. It works well when I copy all the PDF files from their respective sub folders into a single folder e.g c:\temp\source\

    It also renames the files and puts it into the folder c:\test\archive\ (provided I create the folder before hand)

    Now for my task, I copied all the 300+ subfolders into c:\temp\source and checked on "traverse subfolder" in the Foreachloop container, made file types to "*.pdf" and retrieve file name to "name and extension"

    My destination/archive folder is c:\temp\archive\  (where I want all the files to be residing in one place) But it seems the user variable "FullSourcePathName" does not iterate through the subfolders. It points at c:\temp\source\a.b.c.pdf
    instead it shoould be pointing at c:\temp\source\folder 1\a.b.c.pdf

    How can I make the user variable "FullSourcePathName" iterate through the subfolders?

    Currently below are the user variables I have in BIDS with scope of Foreachloop

    SourcePath is C:\Test\Source\ (no expression)

    ArchivePath is C:\temp\Archive\  (no expression)

    MyFileValue is a.b.c.pdf (no expression)

    FullArchivePathFileName is expression based @[User::ArchivePath] + REPLACE( @[User::MyFileValue] ,"."," ")

    FullSourcePathFileName is expression based @[User:Tongue TiedourcePath] + @[User::MyFileValue]


    Thanks


    Thursday, July 31, 2008 5:05 AM
  •  

     

     

    @[UserSurpriseurcePath] should be mapped to the expression for "Directory" on the [Collection] tab of the foreachloop editor, make sure the 'fully qualified' radio button is selected and the "traverse subfolders" checkbox is checked.    On the [Variable Mappings] tab Map "User::MyFileValue" to Index 0.

     

    That will get you User:MyFileValue=(fully qualified path)\filename.fileExtension.  Then it is just a matter of using the variable in the system file task.

    Thursday, July 31, 2008 12:30 PM
  •  davhas35 wrote:

     

     

     

    @[User:Tongue TiedourcePath] should be mapped to the expression for "Directory" on the [Collection] tab of the foreachloop editor, make sure the 'fully qualified' radio button is selected and the "traverse subfolders" checkbox is checked.    On the [Variable Mappings] tab Map "User::MyFileValue" to Index 0.


     

    Thanks for the reply davhas35, I did map User variable Sourepath to the "direcotry" as per your suggestions. I also changed the "Name and extension only" to "fully qualified". On doing so I get the error "The given path's format is not supported"


    Changing "fully qualified" back to "name and extension" I get my original error, which is to say the variable "FullSourcePathFileName" could not find c:\temp\source\a.b.c.pdf (I know that file does not exist there but instead in c:\temp\source\folder 1\a.b.c.pdf)


    What I am having trouble understanding is why "traverse subfolders" checked one is not taking care of pointing to my subfolders?  Is it becuase my subfolder naming conventions are not standardized?  All my subfolders are of different names and lenght, but they pretty much all contain three files (*.pdf, *.doc and *.txt). My interest is to get all the PDF files.



    That will get you User:MyFileValue=(fully qualified path)\filename.fileExtension.  Then it is just a matter of using the variable in the system file task.



    After making the changes I don't get User:MyFileValue=(fully qualified path)\filename.fileExtension  instead I get User:MyFileValue=filename.fileExtensions (is it because I have hard coded the file name and I have no connection managers of any type)



    Thursday, July 31, 2008 1:50 PM
  • As stated, you will have to use the fully qualified path.  However, I think from your question you are going to need another variable expression that chops off the portion of the path that you don't need...

    For example, if you have the path \\myserver.mydomain.com\myfolder\mysubfolder\folder 1\a.b.c.pdf you would want to have an expression that checks on the length of the folder name (You should probably set this as a variable). 

    If you have the following variables
    Code Snippet

     myRootFolder => \\myserver.mydomain.com\myfolder\mysubfolder\ (since this is the folder you will be iterating over)  

     

    myFileName (From For Each Loop) => \\myserver.mydomain.com\myfolder\mysubfolder\Folder 1\a.b.c.pdf


    myDestinationFolder => \\myserver2.mydomain.com\myfolder\


    You could then set

    Code Snippet

    myPartialFileName (evaluate as expression = true), Expression: SUBSTRING( @[User::myFileName] , LEN( @[User::myRootFolder]) + 1, LEN( @[User::myFileName] ) - LEN( @[User::myRootFolder] ))
    And
    myDestinationFileName (evaluate as expression = true), Expression: @[User::myDestinationFolder] + @[User::myPartialFileName]



    Thursday, July 31, 2008 3:09 PM
  •  new2all wrote:
     davhas35 wrote:

     

     

     

    @[UserSurpriseurcePath] should be mapped to the expression for "Directory" on the [Collection] tab of the foreachloop editor, make sure the 'fully qualified' radio button is selected and the "traverse subfolders" checkbox is checked.    On the [Variable Mappings] tab Map "User::MyFileValue" to Index 0.


     

    Thanks for the reply davhas35, I did map User variable Sourepath to the "direcotry" as per your suggestions. I also changed the "Name and extension only" to "fully qualified". On doing so I get the error "The given path's format is not supported"


    Changing "fully qualified" back to "name and extension" I get my original error, which is to say the variable "FullSourcePathFileName" could not find c:\temp\source\a.b.c.pdf (I know that file does not exist there but instead in c:\temp\source\folder 1\a.b.c.pdf)

     

    with name and extension that is what I would expect.

     

    I see what the problem is, I told you the wrong thing.  The name of the destination variable in variable mappings cant be user:Tongue TiedourcePath.  Sorry about that.

     

    I just did a quick package with the following: 1 for each loop container, and 1 script task, and 2 project level variables (myTestDirectory, myTestFile).  I set myTestDirectory ="C:\Temp" and made sure there were subdirectories that contained files.  Then I set the expression for Directory to myTestDirectory, selected fully qualified and traverse subdirectories.  I then mapped myTestFile to index 0. 

     

    in the script task on the script tab I put myTestFile in the 'ReadOnlyVariables' grid line.  then opened the script and put the following line of code above the dts.success line:

     

      

    Dim i As Integer

    For i = 0 To Dts.Variables.Count - 1

    If Dts.Variables(i).Name = "myTestFile" Then MsgBox(Dts.Variables(0).Value)

    Next

     

     

     

    then saved and closed the script and ran it.  it traversed the subfolders giving me the fully qualified path to the file.

     

     

     


    What I am having trouble understanding is why "traverse subfolders" checked one is not taking care of pointing to my subfolders?  Is it becuase my subfolder naming conventions are not standardized?  All my subfolders are of different names and lenght, but they pretty much all contain three files (*.pdf, *.doc and *.txt). My interest is to get all the PDF files.



    That will get you User:MyFileValue=(fully qualified path)\filename.fileExtension.  Then it is just a matter of using the variable in the system file task.



    After making the changes I don't get User:MyFileValue=(fully qualified path)\filename.fileExtension  instead I get User:MyFileValue=filename.fileExtensions (is it because I have hard coded the file name and I have no connection managers of any type)

     

    That shouldnt matter, the traverse subfolders is simply telling the for each loop to look in subfolders and assign the file names to a variable.

     

     

    HTH,

    David

     

    Thursday, July 31, 2008 3:13 PM
  • thanks for the reply Eric


    For example, if you have the path \\myserver.mydomain.com\myfolder\mysubfolder\folder 1\a.b.c.pdf you would want to have an expression that checks on the length of the folder name (You should probably set this as a variable). 


    If I understand you correctly, do you mean it will be essential for the package to get the names of the subfolders into a variable via an expression? If you meant that, would it be because of the sub folder names and lengths being different.?

    In that case would the logic be such, The foreachloop would not only need to get the loop over the next filename but also the corresponding sub folder name?

    I did make a new user variable named "mySubFolder" to just get the sub folder names

    SUBSTRING( @[User::myRootFolder], 16, 90)

    As per your code snippets and instructions I did the required, It works perfect, but only for the sub folder whose values are mentioned (e.g. for myRootFolder variable I have a value C:\Test\Source\SAS Feb 90\) as you mentioned
    \\myserver.mydomain.com\myfolder\mysubfolder\

    Seeing that it works for only one folder I went ahead and changed the "Foreachloop" --> "Collection" --> "Expression" from "myRootFolder" to "mySubFolder" but no luck, only the "Foreachloop Container" turns green without affecting the "FileSystem Task"

    Would you suggest any other logic? Or correct me where I am going wrong?


     
    Thursday, July 31, 2008 7:18 PM
  • thanks david

    I will try this out and let you know 
    Thursday, July 31, 2008 7:19 PM
  •  new2all wrote:


    If I understand you correctly, do you mean it will be essential for the package to get the names of the subfolders into a variable via an expression? If you meant that, would it be because of the sub folder names and lengths being different.?


    To an extent.  This is actually more to make sure that you are getting the same file structure.  For example if you have a temp folder and it does not have the requisite folders underneath to copy your files into, how will it be forced to do so?  You will essentially have to take this information and use it to create the corresponding folders to get the file you want to copy to the correct location.


     new2all wrote:


    In that case would the logic be such, The foreachloop would not only need to get the loop over the next filename but also the corresponding sub folder name?



    Yes, but if you are using fully qualified names you will get this inf
    ormation.


     new2all wrote:


    I did make a new user variable named "mySubFolder" to just get the sub folder names

    SUBSTRING( @[User::myRootFolder], 16, 90)

    As per your code snippets and instructions I did the required, It works perfect, but only for the sub folder whose values are mentioned (e.g. for myRootFolder variable I have a value C:\Test\Source\SAS Feb 90\) as you mentioned
    \\myserver.mydomain.com\myfolder\mysubfolder\

    Seeing that it works for only one folder I went ahead and changed the "Foreachloop" --> "Collection" --> "Expression" from "myRootFolder" to "mySubFolder" but no luck, only the "Foreachloop Container" turns green without affecting the "FileSystem Task"

    Would you suggest any other logic? Or correct me where I am going wrong?
     


    First off, you are going to want to take out the hard coded 16 and 90 you have on the substring.  But yes, you have run into the brick wall here.  I think what will need to happen will be that you will want to split your "subfolder" string by delimiter (i.e. "\") so that you can find each level of folder between the root and the file location.  After this you will need to create the corresponding file structure so that when you go to copy the files it will have a place to land.  This may require another for each loop inside of the for each loop... Or you might just do it via script task... not sure which would be easiest....

    I'm also not 100% positive that this is necessary.  Anyone else have any thoughts?
    Thursday, July 31, 2008 8:40 PM
  • http://www.calsql.com/2009/10/ssis-archive-and-rename-all-files-in.html


    download the package template from the above URL contains all the steps as below


    If one of the task in your SSIS package is to move the source text files to an archive folder after its processed then this can be done through the script task in SSIS.

    Steps involved :
    1. create a variable in SSIS package : right click anywhere in your package and click on variables --> create a variable  as "vFileName" with string as datatype
    2. Create a text file connection with the path to connect the source text file --> define the columns ( the connection path will disappear once its executed as it will use the expressions for connection string in next step)
      • ssis create expressions for connection in ssis :  right click on text file in the connections area --> click on properties  --> click on expressions --> select the user::variable in the left pane --> select "connection string" at the right pane --> click "Ok"
    3. Drag the "foreach loop container" --> right click -->; click on properties --> click on variable --> select "vFileName"
    4. Drag a dataflow or any other transformations to process the text file ( here this package is using a dummy dataflow)
    5. Last task is to archive the source file using a "script task" to a different folder called ARCHIVE and rename the filename with date.
    Tuesday, October 27, 2009 11:04 PM
  • Here is another good article about File Looping and Archive

    FAQ - How to loop through files in a specified folder, load one by one and move to archive folder using SSIS

    http://www.pragmaticworks.com/help/dtsxchange/index.htm#page=FAQ%20-%20How%20to%20loop%20through%20files%20in%20a%20specified%20folder,%20load%20one%20by%20one%20and%20move%20to%20archive%20folder%20using%20SSIS.htm

    Wednesday, November 18, 2009 7:22 AM
  • I have the same requirement and I am able to traverse through the source folder but I 

    want

    C:\Source Dir\Folder 1\subfolder\a.bat

    goes to

    C:\Dest Dir\Folder 1\subfolder\a.bat

    and 

    C:\Source Dir\Folder  2\subfolder\a.bat

    goes to 

    C:\Dest Dir\Folder 2\subfolder\a.bat


    C:\Source Dir\Folder  2\subfolder\c.bat

    and many more . What should be the value of the target variable be and what configuration I have to do for the target variable?

    Thursday, May 16, 2013 10:53 PM