locked
C# Parse Excel Formula and Extract cell address RRS feed

  • Question

  • I have application which read data from excel file including formula using EPPlus. my routine extract cell address from formula.

    few sample formula which my application can still parse using RegEx.

    a) (W2/R2)-1
    b) W23-U23

    But Now a formula occur bit long which my regex could not parse. formula looks like (U2+W2)/2*1000/W25

    This is my code which parse formula and iterate in cell address

    string regex = @"\$?[A-Z]{1,3}\$?[0-9]{1,7}";
    string strDefaultCalculations="(U2+W2)/2*1000/W25";
    
    List<string> lstAddress = Regex.Matches(strDefaultCalculations, regex).Cast<Match>().Select(m => m.Value).ToList();
    
    foreach (var data in lstAddress)
    {
    
    }


    Please some one suggest any changed RegEx as a result it can extract all cell address from this formula (U2+W2)/2*1000/W25. cell address would be U2, W2, W25.

    also changed regex should be parse my others formula too means these formula below which are being parsed properly.

    a) (W2/R2)-1
    b) W23-U23

    Please some one help me. i stuck very badly for my RegEx. thanks


    • Edited by Sudip_inn Wednesday, September 23, 2020 4:48 PM
    Wednesday, September 23, 2020 2:51 PM

Answers

  • Sir when i tried it was not working. anyway i will try it again. i found another code which work fine and much better that my previous one

    /// <summary>
            /// This routine tokenize all cell address from excel formula
            /// Routine taken from https://stackoverflow.com/questions/854827/which-regular-expression-is-able-to-select-excel-column-names-in-a-formula-in-c/859995
            /// </summary>
            /// <param name="formula"></param>
            /// <returns></returns>
            private List<string> GetCellAddress(string formula)
            {
                List<string> strCellAddress = new List<string>();
    
                string rxCellPattern = @"(?<![$])       # match if prefix is absent: $ symbol (prevents matching $A1 type of cells)
                                                # (if all you have is $A$1 type of references, and not $A1 types, this negative look-behind isn't needed)
                                \b              # word boundary (prevents matching Excel functions with a similar pattern to a cell)
                                (?<col>[A-Z]+)  # named capture group, match uppercase letter at least once
                                                # (change to [A-Za-z] if you have lowercase cells)
                                (?<row>\d+)     # named capture group, match a number at least once
                                \b              # word boundary
                                ";
                Regex rxCell = new Regex(rxCellPattern, RegexOptions.IgnorePatternWhitespace);
    
                if (rxCell.IsMatch(formula))
                {
                    //Console.WriteLine("Formula: {0}", formula);
                    foreach (Match cell in rxCell.Matches(formula))
                    {
                        strCellAddress.Add(cell.Value);
                    }
                }
    
                return strCellAddress;
            }

    Just need to call like this way

    List<String> _celladdress = GetCellAddress("(U2+W2)/2*1000/W25");

    working.

    • Marked as answer by Sudip_inn Wednesday, September 23, 2020 6:22 PM
    Wednesday, September 23, 2020 6:20 PM

All replies

  • I have application which read data from excel file including formula using EPPlus. my routine extract cell address from formula.

    few sample formula which my application can still parse using RegEx.

    a) (W2/R2)-1
    b) W23-U23

    But Now a formula occur bit long which my regex could not parse. formula looks like (U2+W2)/2*1000/W25

    This is my code which parse formula and iterate in cell address

    string regex = @"\$?[A-Z]{1,3}\$?[0-9]{1,7}";
    string strDefaultCalculations="(U2+W2)/2*1000/W25";
    
    List<string> lstAddress = Regex.Matches(strDefaultCalculations, regex).Cast<Match>().Select(m => m.Value).ToList();
    
    foreach (var data in lstAddress)
    {
    
    }


    Please some one suggest any changed RegEx as a result it can extract all cell address from this formula (U2+W2)/2*1000/W25. cell address would be U2, W2, W25.

    also changed regex should be parse my others formula too means these formula below which are being parsed properly.

    a) (W2/R2)-1
    b) W23-U23

    Please some one help me. i stuck very badly for my RegEx. thanks



    Your sample code seems to return all the expected addresses: “U2”, “W2” and “W25”.


    Wednesday, September 23, 2020 6:03 PM
  • Sir when i tried it was not working. anyway i will try it again. i found another code which work fine and much better that my previous one

    /// <summary>
            /// This routine tokenize all cell address from excel formula
            /// Routine taken from https://stackoverflow.com/questions/854827/which-regular-expression-is-able-to-select-excel-column-names-in-a-formula-in-c/859995
            /// </summary>
            /// <param name="formula"></param>
            /// <returns></returns>
            private List<string> GetCellAddress(string formula)
            {
                List<string> strCellAddress = new List<string>();
    
                string rxCellPattern = @"(?<![$])       # match if prefix is absent: $ symbol (prevents matching $A1 type of cells)
                                                # (if all you have is $A$1 type of references, and not $A1 types, this negative look-behind isn't needed)
                                \b              # word boundary (prevents matching Excel functions with a similar pattern to a cell)
                                (?<col>[A-Z]+)  # named capture group, match uppercase letter at least once
                                                # (change to [A-Za-z] if you have lowercase cells)
                                (?<row>\d+)     # named capture group, match a number at least once
                                \b              # word boundary
                                ";
                Regex rxCell = new Regex(rxCellPattern, RegexOptions.IgnorePatternWhitespace);
    
                if (rxCell.IsMatch(formula))
                {
                    //Console.WriteLine("Formula: {0}", formula);
                    foreach (Match cell in rxCell.Matches(formula))
                    {
                        strCellAddress.Add(cell.Value);
                    }
                }
    
                return strCellAddress;
            }

    Just need to call like this way

    List<String> _celladdress = GetCellAddress("(U2+W2)/2*1000/W25");

    working.

    • Marked as answer by Sudip_inn Wednesday, September 23, 2020 6:22 PM
    Wednesday, September 23, 2020 6:20 PM
  • Sir when i tried it was not working. anyway i will try it again. i found another code which work fine and much better that my previous one

    /// <summary>
            /// This routine tokenize all cell address from excel formula
            /// Routine taken from https://stackoverflow.com/questions/854827/which-regular-expression-is-able-to-select-excel-column-names-in-a-formula-in-c/859995
            /// </summary>
            /// <param name="formula"></param>
            /// <returns></returns>
            private List<string> GetCellAddress(string formula)
            {
                List<string> strCellAddress = new List<string>();
    
                string rxCellPattern = @"(?<![$])       # match if prefix is absent: $ symbol (prevents matching $A1 type of cells)
                                                # (if all you have is $A$1 type of references, and not $A1 types, this negative look-behind isn't needed)
                                \b              # word boundary (prevents matching Excel functions with a similar pattern to a cell)
                                (?<col>[A-Z]+)  # named capture group, match uppercase letter at least once
                                                # (change to [A-Za-z] if you have lowercase cells)
                                (?<row>\d+)     # named capture group, match a number at least once
                                \b              # word boundary
                                ";
                Regex rxCell = new Regex(rxCellPattern, RegexOptions.IgnorePatternWhitespace);
    
                if (rxCell.IsMatch(formula))
                {
                    //Console.WriteLine("Formula: {0}", formula);
                    foreach (Match cell in rxCell.Matches(formula))
                    {
                        strCellAddress.Add(cell.Value);
                    }
                }
    
                return strCellAddress;
            }

    Just need to call like this way

    List<String> _celladdress = GetCellAddress("(U2+W2)/2*1000/W25");

    working.


    However, the expressions do not seem to work fine with valid formulae like “SHEET1!U2” and “ATAN2(U2,W2)”.



    • Edited by Viorel_MVP Wednesday, September 23, 2020 6:53 PM
    Wednesday, September 23, 2020 6:53 PM
  • so tell me what to change in code specially in RegEx area?

    if you have any idea then please share sir. thanks

    Wednesday, September 30, 2020 7:37 AM