none
Joining and updating tables in excel RRS feed

  • Question

  • Hi

    My problem is as follows

    I need to combine two large excel files either via either a vba macro or by  ms-query

    File a

    Employee number, name, type of file

    Sample data is as follows

    Employee number name   type of file

    123                          a jones               F1

    456                          b smith              F2

    789                         c evans               F3

    File b

    Employee number, name, F1,F2,F3 etc…

    Sample data is as follows

    Employee number name     F1   F2   F3

    234                           a brown   1     0     1

     

    Objective is to effectively update columns f1,f2,f3 in file b using the values in file a, so the logic in t-sql would be

    Update table b

    Set f1 = 1 from table a join table b

    On table a. employee number = table b .employee number

    Where a.[type of file] =’F1’

    There is a macro currently running however this is taking over 4 hours to complete as its clearly not an optimum method to do this.

    Could you please advise as to the most effective way to do this ?

    Thanks

    Scott   

    Sunday, February 18, 2018 4:37 PM

All replies

  • This is the type of thing databases were designed to do. Investing in use of Excel just makes the inevitable more work. The more you use Excel for the data the less likely you are to convert to a database solution until the problem becomes unmanageable and you are forced to convert. You might have gotten to that point.


    Sam Hobbs
    SimpleSamples.Info

    Sunday, February 18, 2018 9:02 PM
  • Hi Sam,

    I agree entirely. However at the moment, a database solution is not possible and i have to work with what i have available to me.

    Scott

    Sunday, February 18, 2018 10:17 PM
  • Hi scott murdoch12,

    Do you mean such a solution is OK as using two Excel files, instead of using a database? 

    Regards,

    Ashidacchi

    Monday, February 19, 2018 12:16 AM
  • Another possibility, if you can use C# or VB.Net, is to read both of the spreadsheets into DataTables or a collection and then process. If you truly must use VBA then VBA has the Dictionary object that can support fast lookup. Even if you don't load all the data for a spreadsheet I assume that the Dictionary object would be more efficient than your current macro for doing the equivalent of a SQL join. At a minimum you could have the employee number as key with the value being the data for each sheet or at least a row number.


    Sam Hobbs
    SimpleSamples.Info

    Monday, February 19, 2018 12:45 AM
  • You don't have to convert the spreadsheet into a database permanently.

    You might not know that SQLite it is now provided with the current version of Windows 10. You could create a temporary database with the tables, with emphasis on temporary. The database could be deleted when done. The spreadsheet could be loaded into a temporary table using VB.Net or whatever then processed there. You could tell management that the processing is faster using the database and that could help move the application in a better direction. That would be a good use of SQLite. The Entity Framework supports SQLite.



    Sam Hobbs
    SimpleSamples.Info

    Monday, February 19, 2018 12:58 AM
  • Excel 2016 Power Query.
    Fast with query folding.
    http://www.mediafire.com/file/xa56atp32b9gbsz/02_18_18b.xlsx

    Monday, February 19, 2018 12:59 AM
  • Hi Ashidacchi,

    I would much prefer to use a database solution however subesequent suggestions about power query and the dictionary option in VBA are worth exploring for me.

    Scott

    Monday, February 19, 2018 8:39 PM
  • Hi scott,

    Based on your description, it seems you want to retrieve file type based on Employee number, I suggest you try Excel VLoopUp function.

    =IFERROR(IF(VLOOKUP($A2,Sheet2!$A$1:$C$5,3,FALSE)=C$1,1,0),0)

    You could check the attachment for a demo.

    https://1drv.ms/x/s!AsyeWQ2z51P3hGvYbvHePB5VqEhg

    Best Regards,

    Tao Zhou


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, February 22, 2018 7:46 AM
  • Hi Scott,

    Sorry for my late reply. I've just begun to think about your question (with Excel VBA).
    I would like to confirm:
    (1) What is a trigger to update File-B?
       a: when File-A is changed?
       b: when a user does some action in File-B (e.g. clicking a button in File-B)?
    (2) What should I do when Where a.[type of file] =’F1’?

    Regards,

    Ashidacchi

    • Edited by Ashidacchi Thursday, February 22, 2018 8:26 AM
    Thursday, February 22, 2018 8:13 AM
  • Hi Asidacchi,

    1. The trigger is simply when file a is updated weekly.

    2. Place a 1 in the F1 column where the employee numbers from both tables match

    thanks

    Scott

    Saturday, February 24, 2018 10:29 PM
  • Hi Scott,

    Thank you for information.
    I have just restarted thinking about your question.
    Please let me confirm more. It's about volume of File-A and File-B.
    (1) How many records (rows) are there?
    (2) What is file size? 
    both in File-A and File-B (if they are Excel files)?

    Regards,

    Ashidacchi

    Sunday, February 25, 2018 3:50 AM
  • Hi Scott,

    I've just made a sample and shared it via OneDrive.
    Please download "ForScottMurdoch12.zip" and try it.

    # In this zip file, there two files. One is "File_A.xmsm", the other is "File_B.xmsm". 
        

        

    Regards,

    Ashidacchi

    Sunday, February 25, 2018 6:41 AM
  • Hi Ashidacchi,

    The files are quite big.

    File b has around 400,000  rows and there are actually columns F1 to F27 which may require updating.

    File a has around 50,000 rows.

    As far as i understand so far, a scripting dictionary in VBA may be the best solution ?

    thanks

    Scott

    Sunday, February 25, 2018 8:03 PM
  • Hi Scott,

    The following is based on my old knowledge about Relational DB (about 20 years ago, but it would be valid today).

    If DBMS (SQL server?) and Excel is in the same computer (i.e. the same spec), Excel would be faster than SQL db, because Excel loads whole data in memory (SQL does not load whole data of a table in memory).

    In your first post, you write:
    "There is a macro currently running however this is taking over 4 hours to complete as its clearly not an optimum method to do this."
    Which are you using, Excel or SQL?

    If it is Excel macro (VBA), please refer to my sample.
    I would like to check file-A (with macro) and file-B. If you can provide your email address, I will reply it, and you can send them as attached files.
    (Please remember to modify/delete vital/corporate/important data, before sharing)

    If it is SQL procedure, you need to re-design your tables (both A and B, but especially B). It takes much more time to update data (if it is indexed, more time). Of course, it is required to modify procedure. 

    If I change SQL table design....
    [A]
      (1) add a field "last update date" (indexed)
      -- for checking if A is updated in this week. If updated, only update records is used for updating [B]. 
    [B]
      (1) unify F1-F27 into one field (indexed), if each of F1-F27 has the same length of value. -- if this is done, time for updating will be decreased drastically.

    Regards,

    Ashidacchi

    Sunday, February 25, 2018 11:30 PM
  • If DBMS (SQL server?) and Excel is in the same computer (i.e. the same spec), Excel would be faster than SQL db, because Excel loads whole data in memory (SQL does not load whole data of a table in memory).

    Not necessarily. We don't know what optimization is done. I do agree that the entire worksheet must be read at once but it is not guaranteed that all worksheets would be. Also we don't know what optimizations a database might do.

    We do know that a spreadsheet is designed for use by people and a databases is designed for use by software. When we use SQL then the database can do all the processing it needs to do to get all the relevant data and it can optimize many things. If the spreadsheet needs to be processed in the manner it is here then there is much interaction between the macro and Excel.



    Sam Hobbs
    SimpleSamples.Info

    Monday, February 26, 2018 12:54 AM
  • @Simple Sample,
       I am talking with Scott.
       I am wondering if you belong to the same company as Scott. Who are you talking to?
    Regards,

    Ashidacchi

    Monday, February 26, 2018 1:20 AM
  • I was replying to you and we are both trying to help Scott. If you think anything I said is wrong then please say so.


    Sam Hobbs
    SimpleSamples.Info


    Monday, February 26, 2018 6:03 AM
  • @Sam Hobbs,
     
       Thank you for reply.
       I can know "we"(you mentioned) means you and I. 

       Please explain the meaning of "If the spreadsheet needs to be processed in the manner it is here then there is much interaction between the macro and Excel."
       Especially "much interaction between the macro and Excel"
       Do you mean "macro" is not executable module, but is executed step by step by interpreter?

    Regards,

    Ashidacchi

    • Edited by Ashidacchi Monday, February 26, 2018 6:25 AM
    Monday, February 26, 2018 6:25 AM
  • I mean that for processing such as:

    Dim myRow As Integer
    For myRow = 2 To 101
    	Cells(myRow, 1).Value = fnc_RandomEmpNo()
    Next
    	' ---
    For myRow = 101 To 2 Step -1
    	If (Cells(myRow, 1).Value = 0) Then
    		Rows(myRow).Delete
    	End If
    Next
    

    We don't know if that is more efficient or less efficient than doing the equivalent using a database. With a database we typically execute SQL so all the relevant data is prepared prior to subsequent processing. Calling Excel for each cell's contents is likely inefficient.

    If we have something authoritive (I am spelling that word my way) saying what is efficient then we don't need to speculate but otherwise if we try to speculate on the efficiency of something then it is important to consider as much relevant data as possible. 



    Sam Hobbs
    SimpleSamples.Info

    Monday, February 26, 2018 9:49 PM