Selecting and concatenating data from several XLS Workbooks RRS feed

  • Question

  • Hi all.  I need some good pointers and apologize first that I will need some spoonfeeding -- it's been 12 years since I programmed Perl or VBScript and never had to actually use VBA beyond the simplest global validation macro!  But my career has perversely turned full circle and I suddenly need to program a data filter!

    I must produce a detailed asset register.  A required Excel template was given to our asseet surveyors but in their wisdom they produced one spreadsheet for each building instead of a single list, and then changed the data layout as they thought appropriate for each building.  I've been given 800+ workbooks that need to be in one worksheet.  Some columns they left empty (which is fine, copying null data is ok here), but sometimes they deleted the unused column.  Some shreadsheets then have new columns added that are not required.

    So, in the source date, not all required columns necessarily exist and must therefore be replaced by null data to maintain structure, and some additional unnecessary columns exist that need to be ignored, the result being that columns are not necessarily in the correct order/location.

    How do I select certain columns from selection of XLS files and import the data into a new single worksheet?  Is the right way about this to use VBA in a new Target worksheet to suck in data, or is it possible/easier to use some other script like SQL to Select and Copy data?

    The algorithm is simple enough in my mind:
    Open next available file in set
    Identify last populated Row in that file
    Identify last populated Row in this worksheet
    -  Select Column by name (Row 1 title) to populated number of Rows
    -  Inject data into current worksheet at end of current dataset
    -  Repeat for other required Columns
    Repeat for other files

    Identifying last populated Row, rather than pasting data to the end of each relevant Column, is necesary to prevent data becoming out of sequence when unused columns are encountered.  I am fully aware that each step that sounds simple is going to be very involved when rendered into VB!

    I found this useful thread* about importing data from external files, which I shall work on to see if I can amend it successfully, but I have no clue about the DOM or Object properties of an Excel Workbook to set about coding from scratch.  I used to be a script-monkey/traditional hacker in a previous career so I'm capable of modifying other peoples code if I have a syntax handbook available, but starting from scratch is a completely different matter after this number of years...

    Anyone have some procedures for these steps in a code library somewhere?  Thank you all for any help you can provide...


    *Of course, I had to recreate my accont after too long dormant, so I can't yet post a link.  This is the address I tried to href:

    • Edited by Dranok Thursday, October 9, 2014 10:03 AM
    Thursday, October 9, 2014 9:59 AM

All replies

  • Re:  Lots of files, open each, extract data, close file

    Your task has been done before, but it is complicated by the possible unique structure of each workbook.
    It will be easier/faster if all of the files (workbooks) are in the same folder on your machine
    with no other files in that folder,
    and with all files having been created in the same general Excel version...  .xls or .xlsx or .xlsm
    Note:  save the original workbooks in a separate folder and do your work on a copy of the folder.

    Also, each file (workbook) can contain multiple worksheets,
    and each worksheet can be in a different position in the workbook,
    and each worksheet can have a custom name on the worksheet tab.
    A particular worksheet is identified by its position... Worksheets(2) for instance or its name...Worksheets("Sludge")

    As for the data on the worksheet...
    Each data column can be found by finding a column heading or using the column position in the worksheet.
    It will greatly simplify your task:
    if the columns containing the data you need have the same column headings (name)
    and that name is always in the same row on the worksheet.
    Also, you need to determine  if the columns headings (names) are identical between workbooks
    or have abbreviations been used in some cases (or even different names: Building Name vs. Location).
    Determining the column(s) by the data they contain could be troublesome.

    I guess what I am saying above is find out what you have before trying to change it.
    You may even want to consider hiring some temporary labor to do the job by cutting and pasting.

    Shown below is some code of mine that might be used as a pattern.
    Jim Cone
    Portland, Oregon USA
    free & commercial excel programs (n/a xl2013)
    'Opens each .xls file in the specified folder and writes the file
    'name to the active sheet if specified data is found in a worksheet.
    Sub FilesToWorksheets_R4()
    'Jim Cone - Portland Oregon - May 2008.
    On Error GoTo ThatHurt
    Dim objFSO    As Object
    Dim objFolder As Object
    Dim objFile   As Object
    Dim strPath   As String
    Dim strName   As String
    Dim N         As Long
    Dim blnTask   As Boolean

    If Val(Application.Version) >= 10 Then
    blnTask = Application.ShowWindowsInTaskbar
    Application.ShowWindowsInTaskbar = False
    End If
    Application.ScreenUpdating = False
    'Specify the folder...
    strPath = "C:\Program Files\Army\Provisions"    '<<<MODIFY

    'Use Microsoft Scripting runtime.
    Set objFSO = CreateObject("Scripting.FileSystemObject")
    If objFSO.FolderExists(strPath) Then
    Set objFolder = objFSO.GetFolder(strPath)
    N = 2
    'Check type of file in the folder and open file.
    For Each objFile In objFolder.Files
    If objFile.Name Like "*.xls" Then
    strName = objFile.Name
    Application.StatusBar = strName
    Workbooks.Open objFile
    With Workbooks(strName).Worksheets(1)
      If .Range("B1") = "Rifles" Or .Range("C1") = "Ammo" Then ' <<<MODIFY
        ThisWorkbook.Worksheets(1).Cells(N, 2) = strName
        N = N + 1
      End If
    End With
    Workbooks(strName).Close savechanges:=False
    End If
    Next 'objFile
    MsgBox "Cannot find folder.  "
    End If

    On Error Resume Next
    Application.ShowWindowsInTaskbar = blnTask
    Application.StatusBar = False
    Application.ScreenUpdating = True
    Set objFSO = Nothing
    Set objFolder = Nothing
    Set objFile = Nothing
    Exit Sub

    MsgBox "Error " & Err.Number & "  " & Err.Description, , "Files To Worksheet"
    Resume CloseOut
    End Sub

    Jim Cone
    Portland, Oregon USA (Dropbox)
    (free & commercial excel add-ins & workbooks)

    • Edited by James Cone Tuesday, November 1, 2016 3:13 AM
    Thursday, October 9, 2014 1:02 PM
  • The code that James posted looks good, although I didn't try it.  Please try his suggestion, or try this.

    This is great too.

    Knowledge is the only thing that I can give you, and still retain, and we are both better off for it.

    Thursday, October 9, 2014 1:21 PM
  • Thank you very much for you reply, which I will now examine in detail (having finally returned to the office!).

    Yes, all files are of the same program version and in one directory.  Column headings are the first row.  Column numbers are not consistent, but I have assumed for the timebeing that all relevant titles are identical.  (Just in case, though, it would be very useful to be able to match column titles by, for instance, Pattern matching, á al Perl etc. scripting, rather than by simple "string equality" match...)  I haven't considered multiple sheets in one workbook as I believe this will be irrelevant in the dataset provided.

    Friday, October 17, 2014 11:35 AM