none
Is this code to find and read a table from a spreadsheet correct? RRS feed

  • Question

  • Good morning!  This code took a while to put together - both cause I'm new to the OpenXML API, and because it's really hard to search for "table" and "spreadsheet" on the internet and not get results polluted by generic information on spreadsheets.  The background here is that I'm using SQL Server Integration Services to read Excel spreadsheets, and I got tired of the limitations in the COM API (bitness, type inference, etc...)

    What follows is a redacted form of the code I have (no need to distract with SSIS and my internal structures).  Can you guys tell me if this code is a reliable method of reading a table?  if there are smallish tweaks I could make to make it more reliable, robust, future-proof?  OpenXML API specific things!  (I'm not looking for try/catch type improvements, I've redacted those.)

    Some specific concerns of mine (but please comment on anything else) - is the way I'm using the "columnOffset" reliable?  I tried the column's ID, but that's not reliable if columns are inserted into the table.  Is there a built-in cell reference conversion function?  Instead of my CellReferenceToCoordinates function?

    Thanks in advance!

    SpreadsheetDocument document = SpreadsheetDocument.Open("File.xlsx", false);
    WorkbookPart workbook = document.WorkbookPart;
    SharedStringTablePart sharedStringTablePart = workbook.SharedStringTablePart;
    Table table = null;
    Worksheet worksheet = null;
    string tableIAmLookingFor = "People";
    string columnIAmLookingFor = "Last Name";
    int columnOffset;
    
    // Find the table, and the sheet it's on
    foreach (Sheet sheet in workbook.Workbook.Sheets)
    {
      WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(sheet.Id);
      foreach (TableDefinitionPart tableDefinitionPart in worksheetPart.TableDefinitionParts)
      {
        if (tableDefinitionPart.Table.DisplayName == tableIAmLookingFor)
        {
          worksheet = worksheetPart.Worksheet;
          table = tableDefinitionPart.Table;
          break;
        }
      }
      if (table != null)
      {
        break;
      }
    }
    
    // Read the table columns, find out where the column I'm interested in is in the table
    int columnIndex = 1;
    foreach (TableColumn tableColumn in table.TableColumns)
    {
      foreach (ColumnMapping columnRelationship in this._columnMappings)
      {
        if (tableColumn.Name == columnIAmLookingFor)
        {
          columnOffset = columnIndex;
          break;
        }
      }
      columnIndex++;
    }
    
    // Read sheet data for values in the table
    SheetData sheetData = worksheet.GetFirstChild<SheetData>();
    IEnumerable<Row> rows = sheetData.Elements<Row>();
    foreach (Row row in rows)
    {
      if (row.RowIndex == 1)
      {
        // Skipping header row
      }
      else
      {
        // my app-specific code to handle the start of reading a "row" (redacted)
        foreach (Cell cell in row.Elements<Cell>())
        {
          // Next line refers to my own function that turns "B15" into {2, 5}
          if (this.CellReferenceToCoordinates(cell.CellReference)[0] == columnOffset)
          {
            string cellValue = this.GetCellValue(cell, sharedStringTablePart);
            // my app-specific code to handle cell value reading (redacted)
          }
        }
      }
    }
    
    document.Close();


    Todd McDermid's Blog Talk to me now on

    Monday, March 26, 2012 4:10 PM

Answers

  • Hi Todd,

    Thanks for posting in the MSDN Forum.

    It's based on my experience that your snippet is OK. However I think the following snippet will make more sense:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Windows.Forms;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Spreadsheet;
    using System.Text.RegularExpressions;
    
    namespace ConsoleApplication4
    {
        class Program
        {
            [STAThread]
            static void Main(string[] args)
            {
                OpenFileDialog d = new OpenFileDialog();
                d.Filter = "Excel Document|*.xlsx";
                d.ShowDialog();
                string Path = d.FileName;
                Table table = null;
                string TableName = "Table1";
                string ColumnName = "Column2";
                using (SpreadsheetDocument sd = SpreadsheetDocument.Open(Path, false))
                {
                    WorkbookPart wp = sd.WorkbookPart;
                    SharedStringTablePart sstp = wp.SharedStringTablePart;
                    List<SharedStringItem> se = sstp.SharedStringTable
                        .Descendants<SharedStringItem>().ToList();
                    Workbook wb = wp.Workbook;
                    WorksheetPart wsp = null;
                    List<Sheet> sheets = wb.Descendants<Sheet>().ToList();
                    foreach (Sheet sheet in sheets)
                    {
                        wsp = (WorksheetPart)wp.GetPartById(sheet.Id);
                        bool flag = false;
                        if (wsp.TableDefinitionParts != null)
                        {
                            foreach (TableDefinitionPart tdp in 
                                wsp.TableDefinitionParts)
                            {
                                if (tdp.Table.Name.Value.Equals(TableName))
                                {
                                    flag = true;
                                    table = tdp.Table;
                                    break;
                                }
                            }
                        }
                        if (flag)
                            break;
                    }
                    if (table != null)
                    {
                        TableColumn tc = table.Descendants<TableColumn>().Where(S 
                            => S.Name == ColumnName).FirstOrDefault();
                        string begin = table.Reference.Value.Split(':')[0];
                        string end = table.Reference.Value.Split(':')[1];
                        int id = int.Parse(tc.Id.ToString());
                        int he = rb(begin);
                        id += he;
                        id--;
                        int iend = rc(end);
                        SheetData sda = wsp.Worksheet.Descendants<SheetData>()
                            .FirstOrDefault();
                        IEnumerable<Cell> Cells = sda.Descendants<Cell>().Where(C 
                            => C.CellReference.Value.IndexOf(NtC(id)) == 0); 
                        
                        foreach (Cell cell in Cells)
                        {
                            if (rc(cell.CellReference) <= iend && 
                                rc(cell.CellReference) > rc(begin))
                            {
                                if (cell.DataType != null)
                                {
                                    if (cell.DataType == CellValues.SharedString)
                                        Console.WriteLine(
                                            se[int.Parse(cell.CellValue.Text)]
                                            .Text.Text);
                                    else
                                        Console.WriteLine(cell.CellValue.Text);
                                }
                                else
                                        Console.WriteLine(cell.CellValue.Text);
                            }
                        }
                    }
                }
                Console.ReadKey();
            }
    
            static int rc(string Position)
            {
                StringBuilder builder = new StringBuilder();
                char[] buffer = Position.ToCharArray();
                foreach (char c in buffer)
                {
                    if (c >= 48 && c <= 57)
                        builder.Append(c);
                }
                return CtN(builder.ToString());
            }
    
            static int rb(string Position)
            {
                string s = Regex.Replace(Position, "\\d+$", "");
                int res = CtN(s);
                return res;
            }
    
            static string NtC(int num)
            {
                int a = 64;
                int x = num;
                StringBuilder builder = new StringBuilder();
                while (x > 0)
                {
                    int d = x % 26;
                    char c = d == 0 ? 'Z' : (char)(a + d);
                    x = (int)x / 26;
                    if (d == 0)
                        x--;
                    builder.Insert(0, c);
                }
                return builder.ToString();
            }
    
            static int CtN(string Postion)
            {
                string se = Postion;
                char[] ds = se.ToCharArray();
                int res = 0;
                for (int i = ds.Length - 1; i >= 0; i--)
                {
                    int x = (int)ds[i];
                    x -= 64;
                    for (int j = 0; j < i; j++)
                    {
                        x *= 26;
                    }
                    res += x;
                }
                return res;
            }
        }
    }

    I hope it can help you.

    Have a good day,

    Tom


    Tom Xu [MSFT]
    MSDN Community Support | Feedback to us

    Tuesday, March 27, 2012 6:00 AM
    Moderator

All replies

  • Hi Todd,

    Thanks for posting in the MSDN Forum.

    It's based on my experience that your snippet is OK. However I think the following snippet will make more sense:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Windows.Forms;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Spreadsheet;
    using System.Text.RegularExpressions;
    
    namespace ConsoleApplication4
    {
        class Program
        {
            [STAThread]
            static void Main(string[] args)
            {
                OpenFileDialog d = new OpenFileDialog();
                d.Filter = "Excel Document|*.xlsx";
                d.ShowDialog();
                string Path = d.FileName;
                Table table = null;
                string TableName = "Table1";
                string ColumnName = "Column2";
                using (SpreadsheetDocument sd = SpreadsheetDocument.Open(Path, false))
                {
                    WorkbookPart wp = sd.WorkbookPart;
                    SharedStringTablePart sstp = wp.SharedStringTablePart;
                    List<SharedStringItem> se = sstp.SharedStringTable
                        .Descendants<SharedStringItem>().ToList();
                    Workbook wb = wp.Workbook;
                    WorksheetPart wsp = null;
                    List<Sheet> sheets = wb.Descendants<Sheet>().ToList();
                    foreach (Sheet sheet in sheets)
                    {
                        wsp = (WorksheetPart)wp.GetPartById(sheet.Id);
                        bool flag = false;
                        if (wsp.TableDefinitionParts != null)
                        {
                            foreach (TableDefinitionPart tdp in 
                                wsp.TableDefinitionParts)
                            {
                                if (tdp.Table.Name.Value.Equals(TableName))
                                {
                                    flag = true;
                                    table = tdp.Table;
                                    break;
                                }
                            }
                        }
                        if (flag)
                            break;
                    }
                    if (table != null)
                    {
                        TableColumn tc = table.Descendants<TableColumn>().Where(S 
                            => S.Name == ColumnName).FirstOrDefault();
                        string begin = table.Reference.Value.Split(':')[0];
                        string end = table.Reference.Value.Split(':')[1];
                        int id = int.Parse(tc.Id.ToString());
                        int he = rb(begin);
                        id += he;
                        id--;
                        int iend = rc(end);
                        SheetData sda = wsp.Worksheet.Descendants<SheetData>()
                            .FirstOrDefault();
                        IEnumerable<Cell> Cells = sda.Descendants<Cell>().Where(C 
                            => C.CellReference.Value.IndexOf(NtC(id)) == 0); 
                        
                        foreach (Cell cell in Cells)
                        {
                            if (rc(cell.CellReference) <= iend && 
                                rc(cell.CellReference) > rc(begin))
                            {
                                if (cell.DataType != null)
                                {
                                    if (cell.DataType == CellValues.SharedString)
                                        Console.WriteLine(
                                            se[int.Parse(cell.CellValue.Text)]
                                            .Text.Text);
                                    else
                                        Console.WriteLine(cell.CellValue.Text);
                                }
                                else
                                        Console.WriteLine(cell.CellValue.Text);
                            }
                        }
                    }
                }
                Console.ReadKey();
            }
    
            static int rc(string Position)
            {
                StringBuilder builder = new StringBuilder();
                char[] buffer = Position.ToCharArray();
                foreach (char c in buffer)
                {
                    if (c >= 48 && c <= 57)
                        builder.Append(c);
                }
                return CtN(builder.ToString());
            }
    
            static int rb(string Position)
            {
                string s = Regex.Replace(Position, "\\d+$", "");
                int res = CtN(s);
                return res;
            }
    
            static string NtC(int num)
            {
                int a = 64;
                int x = num;
                StringBuilder builder = new StringBuilder();
                while (x > 0)
                {
                    int d = x % 26;
                    char c = d == 0 ? 'Z' : (char)(a + d);
                    x = (int)x / 26;
                    if (d == 0)
                        x--;
                    builder.Insert(0, c);
                }
                return builder.ToString();
            }
    
            static int CtN(string Postion)
            {
                string se = Postion;
                char[] ds = se.ToCharArray();
                int res = 0;
                for (int i = ds.Length - 1; i >= 0; i--)
                {
                    int x = (int)ds[i];
                    x -= 64;
                    for (int j = 0; j < i; j++)
                    {
                        x *= 26;
                    }
                    res += x;
                }
                return res;
            }
        }
    }

    I hope it can help you.

    Have a good day,

    Tom


    Tom Xu [MSFT]
    MSDN Community Support | Feedback to us

    Tuesday, March 27, 2012 6:00 AM
    Moderator
  • its not working for me either
    Wednesday, February 18, 2015 5:38 AM