none
Having Problem Reading Formulas From Excel RRS feed

  • Question

  • My code is only reading formulas that were handwritten. It seems as if it can't read formulas that were created by drag and drop with a mouse.

    Please advise.

    Imports System.IO Imports System.Data Imports System.Collections.Generic Imports DocumentFormat.OpenXml.Packaging Imports DocumentFormat.OpenXml.Spreadsheet Partial Class VB

    '....deleted code for clarity Private Function GetCellValue(doc As SpreadsheetDocument, cell As Cell) As String Dim value As String = String.Empty Try If cell.CellValue IsNot Nothing Then value = cell.CellValue.InnerText End If If cell.DataType IsNot Nothing AndAlso cell.DataType.Value = CellValues.SharedString Then Return doc.WorkbookPart.SharedStringTablePart.SharedStringTable.ChildElements.GetItem(Integer.Parse(value)).InnerText End If Catch ex As Exception tdErrorMessage.InnerText = ex.Message End Try Return value End Function Private Function GetFormula(doc As SpreadsheetDocument, cell As Cell) As String Dim value As String = String.Empty If cell.CellFormula IsNot Nothing Then value = cell.CellFormula.Text End If Return value End Function End Class



    Steve Greenbaum



    • Edited by steve48 Thursday, September 8, 2016 3:02 PM Clarify-from Excel
    Thursday, September 8, 2016 2:28 PM

Answers

  • I suggest you use Open XML SDK Tool to check its xml nodes.

    You could download it from https://www.microsoft.com/en-us/download/details.aspx?id=30425

    For the cellformula automatically generated, its FormulaType is "shared". And only the source of the CellFormula.Text would show actual vaules. Other generated cellformula, its CellFormula.Text is empty.

    E.g.

    The first cell formula is

       <x:c r="C1">

            <x:f t="shared" ref="C1:C6" si="0">A1+B1</x:f>

            <x:v>3</x:v>

          </x:c>

    The other auto-filled cell's formulas are

          <x:c r="C2">

            <x:f t="shared" si="0" />

            <x:v>5</x:v>

          </x:c>

     

          <x:c r="C3">

            <x:f t="shared" si="0" />

            <x:v>7</x:v>

          </x:c>

     

    Excel reads the shared cellformula based on the ref (Range of Cells) and si (Shared Group Index)

    In the formula, excel changes the cellreference according to:

    If the column number is the same, in formular: row number +1

    If the row number is the same, in formular: column number +1

    E.g.

    If D1 CellFormula.Text is B1+C1, CellFormula.SharedIndex is 0, CellFormula.Reference is D1:E6

    D2 would be B2+C2

    E1 would be C1+D1

     

    If there are different formulas, there would be appended si (Shared Group Index) attribute.

    For more information about attributes, please visit CellFormula class 


    Friday, September 9, 2016 5:55 PM
    Moderator

All replies

  • I suggest you use Open XML SDK Tool to check its xml nodes.

    You could download it from https://www.microsoft.com/en-us/download/details.aspx?id=30425

    For the cellformula automatically generated, its FormulaType is "shared". And only the source of the CellFormula.Text would show actual vaules. Other generated cellformula, its CellFormula.Text is empty.

    E.g.

    The first cell formula is

       <x:c r="C1">

            <x:f t="shared" ref="C1:C6" si="0">A1+B1</x:f>

            <x:v>3</x:v>

          </x:c>

    The other auto-filled cell's formulas are

          <x:c r="C2">

            <x:f t="shared" si="0" />

            <x:v>5</x:v>

          </x:c>

     

          <x:c r="C3">

            <x:f t="shared" si="0" />

            <x:v>7</x:v>

          </x:c>

     

    Excel reads the shared cellformula based on the ref (Range of Cells) and si (Shared Group Index)

    In the formula, excel changes the cellreference according to:

    If the column number is the same, in formular: row number +1

    If the row number is the same, in formular: column number +1

    E.g.

    If D1 CellFormula.Text is B1+C1, CellFormula.SharedIndex is 0, CellFormula.Reference is D1:E6

    D2 would be B2+C2

    E1 would be C1+D1

     

    If there are different formulas, there would be appended si (Shared Group Index) attribute.

    For more information about attributes, please visit CellFormula class 


    Friday, September 9, 2016 5:55 PM
    Moderator
  • Thank you for answering Celeste.

    I have checked my excel file with the SDK tool. and the data in question does have formulas.

    The formula is something like: =EDATE(I,13) for COL J, etc.

    This evaluates to a date.

    I am wondering if the OpenXML 2.5 SDK is ignoring the formula because the value evaluates to a date.

    The code that attempts to read the formula is:

    Private Function GetFormula(doc As SpreadsheetDocument, cell As Cell) As String
            Dim value As String = String.Empty
    
            If cell.CellFormula IsNot Nothing Then
                value = cell.CellFormula.Text
            End If
    
            Return value
        End Function
    Could you suggest an improvement. Thank you.


    Steve Greenbaum

    Saturday, September 17, 2016 12:20 PM
  • One more point:

    I came across this code in my searches.

    https://msdn.microsoft.com/en-us/library/office/hh298534%28v=office.14%29.aspx?f=255&MSPPError=-2147217396

    I added a test sub to call it.

    When I tested it on a cell that had an edate formula (as best as I could tell using the SDK tool and by looking at the spreadsheet, it returned 42430 which evakuates to a date. I looked at the cells DataType and it was Nothing. So the code never gets to the point where it attempts to read the formula.

    It returns 42430

    Any suggestions?

     Private Sub CallGetCellValueNewTest()
            Dim theValue As String
            Dim filePath As String = Server.MapPath("~/Files/") + Path.GetFileName(FileUpload1.PostedFile.FileName)
            theValue = GetCellValueNew(filePath, "Summary", "L13")
        End Sub
    
        Public Function GetCellValueNew(ByVal fileName As String,
        ByVal sheetName As String,
        ByVal addressName As String) As String
    
            Dim value As String = Nothing
    
            ' Open the spreadsheet document for read-only access.
            Using document As SpreadsheetDocument =
              SpreadsheetDocument.Open(fileName, False)
    
                ' Retrieve a reference to the workbook part.
                Dim wbPart As WorkbookPart = document.WorkbookPart
    
                ' Find the sheet with the supplied name, and then use that Sheet object
                ' to retrieve a reference to the appropriate worksheet.
                Dim theSheet As Sheet = wbPart.Workbook.Descendants(Of Sheet)().
                    Where(Function(s) s.Name = sheetName).FirstOrDefault()
    
                ' Throw an exception if there is no sheet.
                If theSheet Is Nothing Then
                    Throw New ArgumentException("sheetName")
                End If
    
                ' Retrieve a reference to the worksheet part.
                Dim wsPart As WorksheetPart =
                    CType(wbPart.GetPartById(theSheet.Id), WorksheetPart)
    
                ' Use its Worksheet property to get a reference to the cell 
                ' whose address matches the address you supplied.
                Dim theCell As Cell = wsPart.Worksheet.Descendants(Of Cell).
                    Where(Function(c) c.CellReference = addressName).FirstOrDefault
    
                ' If the cell does not exist, return an empty string.
                If theCell IsNot Nothing Then
                    value = theCell.InnerText
    
                    ' If the cell represents an numeric value, you are done. 
                    ' For dates, this code returns the serialized value that 
                    ' represents the date. The code handles strings and 
                    ' Booleans individually. For shared strings, the code 
                    ' looks up the corresponding value in the shared string 
                    ' table. For Booleans, the code converts the value into 
                    ' the words TRUE or FALSE.
                    If theCell.DataType IsNot Nothing Then
                        Select Case theCell.DataType.Value
                            Case CellValues.SharedString
    
                                ' For shared strings, look up the value in the 
                                ' shared strings table.
                                Dim stringTable = wbPart.
                                  GetPartsOfType(Of SharedStringTablePart).FirstOrDefault()
    
                                ' If the shared string table is missing, something
                                ' is wrong. Return the index that is in 
                                ' the cell. Otherwise, look up the correct text in 
                                ' the table.
                                If stringTable IsNot Nothing Then
                                    value = stringTable.SharedStringTable.
                                    ElementAt(Integer.Parse(value)).InnerText
                                End If
    
                            Case CellValues.Boolean
                                Select Case value
                                    Case "0"
                                        value = "FALSE"
                                    Case Else
                                        value = "TRUE"
                                End Select
                        End Select
                    End If
                End If
            End Using
            Return value
        End Function


    Steve Greenbaum



    • Edited by steve48 Saturday, September 17, 2016 12:48 PM added link to source of code
    Saturday, September 17, 2016 12:40 PM
  • >>I am wondering if the OpenXML 2.5 SDK is ignoring the formula because the value evaluates to a date.

     

    Of course no. The following in my file. C2-C4 are all auto-filled cell.

        <x:c r="C1">

          <x:f>EDATE(A1,B1)</x:f>

          <x:v>61</x:v>

        </x:c>

        <x:c r="C2">

          <x:f t="shared" ref="C2:C6" si="0">EDATE(A2,B2)</x:f>

          <x:v>123</x:v>

        </x:c>

        <x:c r="C3">

          <x:f t="shared" si="0" />

          <x:v>185</x:v>

        </x:c>

        <x:c r="C4">

          <x:f t="shared" si="0" />

          <x:v>248</x:v>

        </x:c>

     

    >> I looked at the cells DataType and it was Nothing

    It is theCell.CellFormula.FormulaType instead of DataType.

     

    You could refer to the code below to check your if it is shared formula.

    Besides, if you want to output the shared formula, we need to figure out how to change the cellreference in the formula. I think this needs large work and depends on the complexity of the formula. And I would suggest you post a new thread for how to modify the cellformula according to the cellreference.

       Sub Main()
            Dim fileName As String = "C:\Users\celeste\Desktop\With.xlsx"
    
            Using document As SpreadsheetDocument = SpreadsheetDocument.Open(fileName, False)
                Dim wbPart As WorkbookPart = document.WorkbookPart
                Dim wsPart As WorksheetPart = wbPart.WorksheetParts.First
                Dim cell As Cell
                For Each cell In wsPart.Worksheet.Descendants(Of Cell)
                    Console.WriteLine(GetFormula(document, cell))
                Next
            End Using
            Console.ReadKey()
        End Sub  
    
    
    Private Function GetFormula(doc As SpreadsheetDocument, cell As Cell) As String      
            Dim value As String = String.Empty
            If cell.CellFormula IsNot Nothing Then
                If cell.CellFormula.FormulaType Is Nothing Then
                    value = cell.CellReference.Value & " formula is " & cell.CellFormula.Text
                    'cell.cellformula.formulatype IsNot Nothing
                ElseIf cell.CellFormula.FormulaType.InnerText = "shared" Then
                    'value = cell.CellReference.Value & " SharedIndex is " & cell.CellFormula.SharedIndex.InnerText
                    If cell.CellFormula.Reference IsNot Nothing Then
                        value = cell.CellReference.Value & " formula is " & cell.CellFormula.Text &
                        " SharedIndex is " & cell.CellFormula.SharedIndex.InnerText &
                        " Reference is " & cell.CellFormula.Reference.InnerText
                    Else
                    End If
                End If
    
                If cell.CellFormula.SharedIndex IsNot Nothing And cell.CellFormula.Reference IsNot Nothing Then
                    For Each withoutRefCell As Cell In doc.WorkbookPart.WorksheetParts.First.Worksheet.Descendants(Of Cell)
                        If withoutRefCell.CellFormula IsNot Nothing Then
                            If withoutRefCell.CellFormula.SharedIndex IsNot Nothing And withoutRefCell.CellFormula.Reference Is Nothing Then
                                If withoutRefCell.CellFormula.SharedIndex.Value = cell.CellFormula.SharedIndex.Value Then
                                    If getCellColumnNum(withoutRefCell.CellReference) = getCellColumnNum(cell.CellReference) Then
                                        Console.WriteLine(withoutRefCell.CellReference.Value & " is referring to " & cell.CellFormula.Reference.InnerText)
                                        Console.WriteLine("And need to change the columnReference")
                                    End If
                                    If getCellRowNum(withoutRefCell.CellReference) = getCellRowNum(cell.CellReference) Then
                                        Console.WriteLine(withoutRefCell.CellReference.Value & " is referring to " & cell.CellFormula.Reference.InnerText)
                                        Console.WriteLine("Ane need to change the rowReference")
                                    End If
                                    ' value = value & " " & withOutRefCell.CellReference.Value & " is referring to " & cell.CellFormula.Reference.InnerText
                                End If
                            End If
                        End If
                    Next
                End If
            End If
            Return value
        End Function
        Function getCellColumnNum(ByVal cellReference As String) As Integer
            Dim columnReference As String = Regex.Replace(cellReference.ToUpper(), "[\d]", String.Empty)
            Dim columnNumber As Integer = -1
            Dim mulitplier As Integer = 1
            For Each c As Char In columnReference.ToCharArray().Reverse()
                columnNumber += mulitplier * (Convert.ToInt32(c) - 64)
                mulitplier = mulitplier * 26
            Next
            Return columnNumber + 1
        End Function
    
        Function getCellRowNum(ByVal cellReference As String) As Integer
            Dim regex As New Regex("\d+")
            Dim match As Match = regex.Match(cellReference)
            Return UInteger.Parse(match.Value)
        End Function

    Monday, September 19, 2016 7:47 AM
    Moderator