none
How to take the average of all columns in Excel that contain a certain header string? RRS feed

  • Question

  • Hi all.  New VB programmer here.

    I have written a program that records times, in seconds, into an excel spreadsheet.  At the top of each column of data is a header, that contains the build and run number of the software being tested.  For example, a column of data might be:

    219.00.00.091 x64 Run 1

    8.369333333
    0.285
    1.272666667
    0.971666667
    1.341666667
    1.000666667
    0.827333333
    2.463

    With the top cell being a header.

    What I want to do is take the average of all the runs and insert a new row into the front of the data and delete the cells with the run data.

    While I typically run the test 10 times, so there would typically be 10 columns of data, I would like to write it in a generic manner so that it simply finds all columns that contain the string "(build number) Run" and average their rows.

    I believe I will be using something like this:

    Dim GCell As RangeDim findheader as string = "219.00.00.091 x64 Run"

    Set GCell = ActiveSheet.Cells.Find(findheader)

    But I think this only gives the value of the first cell found?

    I read about find here:

    https://msdn.microsoft.com/en-us/vba/excel-vba/articles/range-find-method-excel

    Once I get my range of columns, then I can probably figure out how to take the average from there.

    Thanks,



    Friday, July 13, 2018 7:18 PM

All replies

  • OK, some progress.

    I have discovered that Gcell, being type Range, is not a value itself. 

    You have to use GCell.address to get the address.

    So I have this:

    strHeaderText = "219.00.00.091 x64 Run"
    Dim GCell As Range GCell = objSheet.Cells.Find(strHeaderText) Console.WriteLine(GCell.Address)

    This returns the first cell where this string exists:$B$1

    However, it also exists in C1 and D1.  So I need to figure out how to get all columns where this header string resides.

    The quest continues...


    Friday, July 13, 2018 8:43 PM
  • OK, I think I'm getting closer, but this generates an error:

    Dim FindFirst As String = Nothing
    Dim GCell As Range
    GCell = objSheet.Cells.Find(strHeaderText)
    
    Console.WriteLine(GCell.Address)
    FindFirst = GCell.Address
    
    GCell = objSheet.Cells.FindNext(strHeaderText)
    
    While GCell.Address <> FindFirst
        Console.WriteLine(GCell.Address)
        GCell = objSheet.Cells.FindNext(strHeaderText)
    End While

    Friday, July 13, 2018 8:53 PM
  • First off I don't have time to write a code sample.

    I would suggest if working with .xlsx to use SpreadSheetLight library. Excel doesn't need to be installed.

    Open an Excel file, get the last used row for a column via a method in the class, start reading by skipping the first row. Iterate each row, use GetCellValueAsDouble or GetCellValueAsDecimale. Add each value to a List(Of Double) or List(Of Decimal) then call .Average on the List to get the average value.

    To get started see my code sample on MSDN.

    Edit: Here is an example

    Public Sub JulyQuestion()
        Dim valueList As New List(Of Decimal)
        Using doc As New SLDocument(Path.Combine(AppDomain.
                CurrentDomain.BaseDirectory, "July2018.xlsx"), "Data1")
    
            Dim stats As SLWorksheetStatistics = doc.GetWorksheetStatistics()
    
            For index = 2 To stats.EndRowIndex - 1
                valueList.Add(doc.GetCellValueAsDecimal(index, 1))
            Next
    
        End Using
    
        Console.WriteLine(Math.Round(valueList.Average(), 2, MidpointRounding.AwayFromZero))
    
    End Sub


    Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.
    VB Forums - moderator
    profile for Karen Payne on Stack Exchange, a network of free, community-driven Q&A sites


    Friday, July 13, 2018 9:27 PM
    Moderator
  • Hi Karen,

    I guess I should have started off with a simpler version of my question.  The first thing I think I need to do is identify the headers that contain the columns that I want to average.  To do this, I want to search for all cells that contain a certain string.  Due to the nature of the string, I know that if found, these will always only be header cells.


    Here is what I tried:

    'First, identify all columns that contain a header with the current SE Version number in them.
       Dim FoundFirst As String = Nothing
       Dim FoundNext As String = Nothing
       Dim GCell As Range
       GCell = objSheet.Cells.Find(strHeaderText)
       FoundFirst = GCell.Address
       Console.WriteLine(FoundFirst)
    
    While FoundNext <> FoundFirst
       GCell = objSheet.Cells.FindNext(strHeaderText)
       FoundNext = GCell.Address
       Console.WriteLine(FoundNext)
    End While
    
    'Display the first and last header cell locations.
    Console.WriteLine("The first header cell is: " + FoundFirst)
    Console.WriteLine("The last header cell is: " + FoundNext)

    However, this code fails at the line:

    GCell = objSheet.Cells.FindNext(strHeaderText)

    The first Find succeeds and prints out the expected cell value.  The FindNext fails.

    Here is an example of data I am trying to read:

    Monday, July 16, 2018 1:20 PM
  • OK, I have discovered my error.

    Evidently, when you call FindNext, you are not providing the string again, but you reference the previous range.  So the correct command is:

    GCell = objSheet.Cells.FindNext(GCell)

    Now, I have a logic error in the above code in that I perform the findnext one too many times so the first found cell is also the last-found cell (I have wrapped around in my finds), but I will address that here shortly.

    Monday, July 16, 2018 1:30 PM
  • OK, working code now is:

    Dim FoundFirst As String = Nothing
    Dim FoundNext As String = Nothing
    Dim GCell As Range
    GCell = objSheet.Cells.Find(strHeaderText)
    FoundFirst = GCell.Address
    Console.WriteLine(FoundFirst)
    
    While FoundNext <> FoundFirst
       GCell = objSheet.Cells.FindNext(GCell)
       FoundNext = GCell.Address
       Console.WriteLine(FoundNext)
    End While
    
    GCell = objSheet.Cells.FindPrevious(GCell)
    FoundNext = GCell.Address
    
    'Display the first and last header cell locations.
    Console.WriteLine("The first header cell is: " + FoundFirst)
    Console.WriteLine("The last header cell is: " + FoundNext)

    The next step will be to parse out the columns from the cell values.  Then, since the data is in known rows, I can easily insert a column and average across the known rows and the found header columns.  More to come...


    Monday, July 16, 2018 1:36 PM
  • Hi Karen,

    I guess I should have started off with a simpler version of my question.  The first thing I think I need to do is identify the headers that contain the columns that I want to average.  To do this, I want to search for all cells that contain a certain string.  Due to the nature of the string, I know that if found, these will always only be header cells.


    Here is what I tried:

    'First, identify all columns that contain a header with the current SE Version number in them.
       Dim FoundFirst As String = Nothing
       Dim FoundNext As String = Nothing
       Dim GCell As Range
       GCell = objSheet.Cells.Find(strHeaderText)
       FoundFirst = GCell.Address
       Console.WriteLine(FoundFirst)
    
    While FoundNext <> FoundFirst
       GCell = objSheet.Cells.FindNext(strHeaderText)
       FoundNext = GCell.Address
       Console.WriteLine(FoundNext)
    End While
    
    'Display the first and last header cell locations.
    Console.WriteLine("The first header cell is: " + FoundFirst)
    Console.WriteLine("The last header cell is: " + FoundNext)

    However, this code fails at the line:

    GCell = objSheet.Cells.FindNext(strHeaderText)

    The first Find succeeds and prints out the expected cell value.  The FindNext fails.

    Here is an example of data I am trying to read:

    Using SpreadSheetLight, get column names

    ''' <summary> 
    ''' Get column headers for a worksheet 
    ''' </summary> 
    ''' <param name="FileName">Valid Excel 2007+ file</param> 
    ''' <param name="SheetName">Existing Worksheet in FileName</param> 
    ''' <returns></returns> 
    Public Function ColumnHeaders(ByVal FileName As String, ByVal SheetName As String) As List(Of String) 
        Using sl As New SLDocument(FileName, SheetName) 
            Dim stats As SLWorksheetStatistics = sl.GetWorksheetStatistics 
            Return Enumerable.Range(1, stats.EndColumnIndex).Select(Function(cellIndex) sl.GetCellValueAsString($"{SLConvert.ToColumnName(cellIndex)}1")).ToList 
        End Using 
    End Function 


    Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.
    VB Forums - moderator
    profile for Karen Payne on Stack Exchange, a network of free, community-driven Q&A sites

    Monday, July 16, 2018 2:04 PM
    Moderator
  • I'm afraid I'm even less familiar with "SpreadSheetLight" than I am with VB, so I'd like to steer clear of it for now.

    I can now identify, with numeric identifiers for row and colum, all of the rows and columns I need to take the average of.

    To start with, I tried to take the average using hard-coded cell values:

    Dim StartupAverage As Double = Application.WorksheetFunction.Average("B5:E5")
    Dim ExitAverage As Double = Application.WorksheetFunction.Average("B6:E6")
    Dim PartOrderedAverage As Double = Application.WorksheetFunction.Average("B8:E8")
    Dim PartSyncAverage As Double = Application.WorksheetFunction.Average("B9:E9")
    Dim SMOrderedAverage As Double = Application.WorksheetFunction.Average("B10:E10")
    Dim SMSyncAverage As Double = Application.WorksheetFunction.Average("B11:E11")
    Dim DraftAverage As Double = Application.WorksheetFunction.Average("B12:E12")
    Dim AsmAverage As Double = Application.WorksheetFunction.Average("B13:E13")
    
    
    

    But this must not be the right way to call the Average command as I am getting this error:

    Error    BC30469    Reference to a non-shared member requires an object reference.

    What is the correct syntax to take the average of a range of cells?

    Steve

    Monday, July 16, 2018 3:50 PM
  • I'm afraid I'm even less familiar with "SpreadSheetLight" than I am with VB, so I'd like to steer clear of it for now.

    I can now identify, with numeric identifiers for row and colum, all of the rows and columns I need to take the average of.

    To start with, I tried to take the average using hard-coded cell values:

    Dim StartupAverage As Double = Application.WorksheetFunction.Average("B5:E5")
    Dim ExitAverage As Double = Application.WorksheetFunction.Average("B6:E6")
    Dim PartOrderedAverage As Double = Application.WorksheetFunction.Average("B8:E8")
    Dim PartSyncAverage As Double = Application.WorksheetFunction.Average("B9:E9")
    Dim SMOrderedAverage As Double = Application.WorksheetFunction.Average("B10:E10")
    Dim SMSyncAverage As Double = Application.WorksheetFunction.Average("B11:E11")
    Dim DraftAverage As Double = Application.WorksheetFunction.Average("B12:E12")
    Dim AsmAverage As Double = Application.WorksheetFunction.Average("B13:E13")
    
    

    But this must not be the right way to call the Average command as I am getting this error:

    Error    BC30469    Reference to a non-shared member requires an object reference.

    What is the correct syntax to take the average of a range of cells?

    Steve

    Steve, over the years I've moved away from Excel automation, way too many problems. What I can give you are the proper techniques to work with Excel automation in the following MSDN code sample but does not directly address the issue yet will provide a solid pattern to get there.

    In regards to SpreadSheetLight, I would figure you are not familiar with it yet that is not a reason to not look at it as a) does not require Excel installed, b) has no issues with memory leakage as with automation c) requires less lines of code than excel automation.


    Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.
    VB Forums - moderator
    profile for Karen Payne on Stack Exchange, a network of free, community-driven Q&A sites

    Monday, July 16, 2018 4:14 PM
    Moderator
  • The following should work, just note the assumptions made in the comments and adjust values if necessary.

    First, a class to hold the results:

    Public Class RunAverage
        Public Property Name As String
        Public Property Average As Double
    End Class

    Then a method to calculate the averages:

    Private Function GetAverages(workbookPath As String, workSheetName As String) As List(Of RunAverage)
        Dim result As New List(Of RunAverage)
        Dim excel As New Microsoft.Office.Interop.Excel.Application
        Dim workbook = excel.Workbooks.Open(workbookPath)
        Dim worksheet As Microsoft.Office.Interop.Excel.Worksheet = workbook.Worksheets.Item(workSheetName)
    
        Try
            'specify a regex pattern to match the header value
            Dim headerPattern As String = "\d\d\d[.]\d\d[.]\d\d[.]\d\d\d[ ].*[ ]Run[ ]\d"
            'create list to store columns found
            Dim targetColumns As New List(Of Microsoft.Office.Interop.Excel.Range)
            'get used range of worksheet
            Dim usedRange = worksheet.UsedRange
            'assumes header row is first row of used range, if not, adjust "1" in Rows.Item() below
            For Each cell As Microsoft.Office.Interop.Excel.Range In CType(usedRange.Rows.Item(1), Microsoft.Office.Interop.Excel.Range).Cells
                'get column if header cell matches pattern
                If System.Text.RegularExpressions.Regex.IsMatch(cell.Value2.ToString, headerPattern) Then
                    targetColumns.Add(cell.EntireColumn)
                End If
            Next
            'get used row count
            Dim rowCount = usedRange.Rows.Count
            'loop found columns
            For Each column In targetColumns
                Dim total As Double = 0
                'loop data value rows in column (skip header)
                For i = 2 To rowCount
                    'sum values
                    total += CDbl(CType(column.Cells.Item(i), Microsoft.Office.Interop.Excel.Range).Value2)
                Next
                'get average
                Dim avg As Double = total / (rowCount - 1)
                'add result to list
                result.Add(New RunAverage With {.Name = CType(column.Cells.Item(column.Row), Microsoft.Office.Interop.Excel.Range).Value2.ToString, .Average = avg})
            Next
        Catch ex As Exception
            MessageBox.Show(ex.Message)
        Finally
            workbook.Close()
            excel.Quit()
            Runtime.InteropServices.Marshal.ReleaseComObject(excel)
        End Try
        Return result
    End Function
    


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Monday, July 16, 2018 7:59 PM
    Moderator
  • So, as a final update, I have finished this program.  This was my first attempt at writing an automated test program.  Basically, it launches our software product, and records how long it takes for the product to launch, how long it takes it to create different kinds of new documents, and how long it takes to exit.  We monitor these performance metrics over time to make sure the product is not creeping in time for these tasks.

    One version of the test just repeats the test N times, recording the results, averaging them, and writing them to an Excel spreadsheet (which contains some graphs and other data-monitoring functions).  That was pretty easy.

    The other version of the test needed to run each test after rebooting the machine, so that the program dlls would not be cached in system memory.  To do this I have a batch file that injects itself into the users startup directory, and then I have another "listener" batch file run on a remote system that listens for the system to reboot, then, after it detects the RDP port 3389 back up, it initiates a RDP call to the system, which triggers the startup batch file and the test battery proceeds. 

    Each battery writes the results to another similar Excel spreadsheet, however many N times the user told the test to run.  After all tests are run, a final program (the one I asked about in this thread) looks in the spreadsheet for matching column(s) of data, and then averages them, writes out the data, and deletes all the individual run columns of data.

    So far, my programming is pretty crude.  I don't do much "try, catch" stuff yet (I just use If, then, else), and my programming is pretty linear without many calls to functions or subroutines.  I did make a function for "Press Any Key" though.  :)

    Still, I'm happy that I have finished my first real program in many years!

    Tuesday, July 17, 2018 7:08 PM