locked
Using LINQ and an XML spreadsheet RRS feed

  • Question

  • I have an Excel workbook with several worksheets in it that has been saved in an XML.

     

    Is there a good "for dummies" tutorial describing how to use LINQ to query the data?

     

    Thanks

     

    Griff

    Tuesday, May 6, 2008 4:23 PM

Answers

  • Here is an example that outputs the data in the worksheets:

    Code Snippet

                XDocument workbook = XDocument.Load(@"..\..\XMLFile1.xml");
                XNamespace ss = "urn:schemas-microsoft-com:office:spreadsheet";
                XNamespace o = "urn:schemas-microsoft-com:office:office";
                XNamespace x = "urn:schemas-microsoft-com:office:excel";
                foreach (XElement worksheet in workbook.Descendants(ss + "Worksheet"))
                {
                    Console.WriteLine("Processing Worksheet named \"{0}\":", worksheet.Attribute(ss + "Name").Value);
                    foreach (XElement row in worksheet.Descendants(ss + "Row"))
                    {
                        foreach (XElement cell in row.Descendants(ss + "Data"))
                        {
                            Console.Write("{0}\t", cell.Value);
                        }
                        Console.WriteLine();
                    }
                    Console.WriteLine();
                }

     

     

    Output looks as follows for the sample you posted:

    Code Snippet

    Processing Worksheet named "Customer Pricing":
    Account Product Price
    A1      P1      5.32
    A2      P2      12.45

    Processing Worksheet named "Customer Discounts":
    Account Discount
    A1      0.1
    A2      0.09

    Processing Worksheet named "Sheet3":

     

     

    Tuesday, May 6, 2008 5:53 PM

All replies

  • MSDN has a lot on LINQ to XML: http://msdn.microsoft.com/en-us/library/bb387098.aspx

    None of that is geared towards processing Office XML formats but those are just XML with a couple of namespaces defined and used so anything that explains LINQ and how to deal with namespaces should provide a start: http://msdn.microsoft.com/en-us/library/bb387093.aspx

    Tuesday, May 6, 2008 4:29 PM
  • I have the following "trivial" xml example saved from Excel (two worksheets with some dummy data)

     

     

    Code Snippet


    <?xml version="1.0"?>
    <?mso-application progid="Excel.Sheet"?>
    <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
     xmlns:o="urn:schemas-microsoft-com:office:office"
     xmlns:x="urn:schemas-microsoft-com:office:excel"
     xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
     xmlns:html="http://www.w3.org/TR/REC-html40">
     <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
      <Author>********</Author>
      <LastAuthor>********</LastAuthor>
      <Created>2008-05-06T13:44:18Z</Created>
      <LastSaved>2008-05-06T17:12:15Z</LastSaved>
      <Company>********</Company>
      <Version>11.9999</Version>
     </DocumentProperties>
     <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
      <WindowHeight>12270</WindowHeight>
      <WindowWidth>15195</WindowWidth>
      <WindowTopX>480</WindowTopX>
      <WindowTopY>60</WindowTopY>
      <ActiveSheet>1</ActiveSheet>
      <ProtectStructure>False</ProtectStructure>
      <ProtectWindows>False</ProtectWindows>
     </ExcelWorkbook>
     <Styles>
      <Style ss:ID="Default" ss:Name="Normal">
       <Alignment ss:Vertical="Bottom"/>
       <Borders/>
       <Font/>
       <Interior/>
       <NumberFormat/>
       <Protection/>
      </Style>
      <Style ss:ID="s21">
       <Font x:Family="Swiss" ss:Bold="1"/>
      </Style>
      <Style ss:ID="s22">
       <NumberFormat ss:Format="0%"/>
      </Style>
     </Styles>
     <Worksheet ss:Name="Customer Pricing">
      <Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="3" x:FullColumns="1"
       x:FullRows="1">
       <Column ss:Width="51"/>
       <Row>
        <Cell ss:StyleID="s21"><Data ss:Type="String">Account</Data></Cell>
        <Cell ss:StyleID="s21"><Data ss:Type="String">Product</Data></Cell>
        <Cell ss:StyleID="s21"><Data ss:Type="String">Price</Data></Cell>
       </Row>
       <Row>
        <Cell><Data ss:Type="String">A1</Data></Cell>
        <Cell><Data ss:Type="String">P1</Data></Cell>
        <Cell><Data ss:Type="Number">5.32</Data></Cell>
       </Row>
       <Row>
        <Cell><Data ss:Type="String">A2</Data></Cell>
        <Cell><Data ss:Type="String">P2</Data></Cell>
        <Cell><Data ss:Type="Number">12.45</Data></Cell>
       </Row>
      </Table>
      <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
       <Print>
        <ValidPrinterInfo/>
        <PaperSizeIndex>9</PaperSizeIndex>
        <HorizontalResolution>600</HorizontalResolution>
        <VerticalResolution>600</VerticalResolution>
       </Print>
       <Panes>
        <Pane>
         <Number>3</Number>
         <RangeSelection>R1C1:R3C1</RangeSelection>
        </Pane>
       </Panes>
       <ProtectObjects>False</ProtectObjects>
       <ProtectScenarios>False</ProtectScenarios>
      </WorksheetOptions>
     </Worksheet>
     <Worksheet ss:Name="Customer Discounts">
      <Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="3" x:FullColumns="1"
       x:FullRows="1">
       <Row>
        <Cell ss:StyleID="s21"><Data ss:Type="String">Account</Data></Cell>
        <Cell ss:StyleID="s21"><Data ss:Type="String">Discount</Data></Cell>
       </Row>
       <Row>
        <Cell><Data ss:Type="String">A1</Data></Cell>
        <Cell ss:StyleID="s22"><Data ss:Type="Number">0.1</Data></Cell>
       </Row>
       <Row>
        <Cell><Data ss:Type="String">A2</Data></Cell>
        <Cell ss:StyleID="s22"><Data ss:Type="Number">0.09</Data></Cell>
       </Row>
      </Table>
      <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
       <Print>
        <ValidPrinterInfo/>
        <PaperSizeIndex>9</PaperSizeIndex>
        <HorizontalResolution>600</HorizontalResolution>
        <VerticalResolution>600</VerticalResolution>
       </Print>
       <Selected/>
       <Panes>
        <Pane>
         <Number>3</Number>
         <ActiveRow>4</ActiveRow>
         <ActiveCol>3</ActiveCol>
        </Pane>
       </Panes>
       <ProtectObjects>False</ProtectObjects>
       <ProtectScenarios>False</ProtectScenarios>
      </WorksheetOptions>
     </Worksheet>
     <Worksheet ss:Name="Sheet3">
      <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
       <ProtectObjects>False</ProtectObjects>
       <ProtectScenarios>False</ProtectScenarios>
      </WorksheetOptions>
     </Worksheet>
    </Workbook>

     

     

    The following LINQ query attempts to get each workbook

    Code Snippet

     

    XDocument xmlInput = XDocument.Load("c:\\myFolder\\myFile.xml");

    XNamespace ss = "urn:schemas-microsoft-com:office:spreadsheet";

     

    var worksheets = from element in xmlInput.Descendants(ss + "Worksheet")

    select element;

     

    foreach (XElement worksheet in worksheets )

    {

    ...;

    }

     

    When I inspect each "worksheet" I see loads of new namespaces (P...). 

     

    Code Snippet
    +  worksheet <Worksheet p1:Name="Customer Pricing" xmlns:p1="urn:schemas-microsoft-com:office:spreadsheet" xmlns="urn:schemas-microsoft-com:office:spreadsheet">
      <Table p3:ExpandedColumnCount="3" p3:ExpandedRowCount="3" x:FullColumns="1" x:FullRows="1" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:p3="urn:schemas-microsoft-com:office:spreadsheet">
        <Column p6:Width="51" xmlns:p6="urn:schemas-microsoft-com:office:spreadsheet" />
        <Row>
          <Cell p7:StyleID="s21" xmlns:p7="urn:schemas-microsoft-com:office:spreadsheet">
            <Data p9:Type="String" xmlns:p9="urn:schemas-microsoft-com:office:spreadsheet">Account</Data>
          </Cell>
          <Cell p7:StyleID="s21" xmlns:p7="urn:schemas-microsoft-com:office:spreadsheet">
            <Data p9:Type="String" xmlns:p9="urn:schemas-microsoft-com:office:spreadsheet">Product</Data>
          </Cell>
          <Cell p7:StyleID="s21" xmlns:p7="urn:schemas-microsoft-com:office:spreadsheet">
            <Data p9:Type="String" xmlns:p9="urn:schemas-microsoft-com:office:spreadsheet">Price</Data>
          </Cell>
        </Row>
        <Row>
          <Cell>
            <Data p8:Type="String" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">A1</Data>
          </Cell>
          <Cell>
            <Data p8:Type="String" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">P1</Data>
          </Cell>
          <Cell>
            <Data p8:Type="Number" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">5.32</Data>
          </Cell>
        </Row>
        <Row>
          <Cell>
            <Data p8:Type="String" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">A2</Data>
          </Cell>
          <Cell>
            <Data p8:Type="String" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">P2</Data>
          </Cell>
          <Cell>
            <Data p8:Type="Number" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">12.45</Data>
          </Cell>
        </Row>
      </Table>
      <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
        <Print>
          <ValidPrinterInfo />
          <PaperSizeIndex>9</PaperSizeIndex>
          <HorizontalResolution>600</HorizontalResolution>
          <VerticalResolution>600</VerticalResolution>
        </Print>
        <Panes>
          <Pane>
            <Number>3</Number>
            <RangeSelection>R1C1:R3C1</RangeSelection>
          </Pane>
        </Panes>
        <ProtectObjects>False</ProtectObjects>
        <ProtectScenarios>False</ProtectScenarios>
      </WorksheetOptions>
    </Worksheet> System.Xml.Linq.XElement

     

     

    Not sure how best to read the data I actually want from this.

     

    Griff

     

     

     

     

    Tuesday, May 6, 2008 5:33 PM
  • Can you describe the data you are looking for? Then we can help write a query.

    Tuesday, May 6, 2008 5:40 PM
  • Here is an example that outputs the data in the worksheets:

    Code Snippet

                XDocument workbook = XDocument.Load(@"..\..\XMLFile1.xml");
                XNamespace ss = "urn:schemas-microsoft-com:office:spreadsheet";
                XNamespace o = "urn:schemas-microsoft-com:office:office";
                XNamespace x = "urn:schemas-microsoft-com:office:excel";
                foreach (XElement worksheet in workbook.Descendants(ss + "Worksheet"))
                {
                    Console.WriteLine("Processing Worksheet named \"{0}\":", worksheet.Attribute(ss + "Name").Value);
                    foreach (XElement row in worksheet.Descendants(ss + "Row"))
                    {
                        foreach (XElement cell in row.Descendants(ss + "Data"))
                        {
                            Console.Write("{0}\t", cell.Value);
                        }
                        Console.WriteLine();
                    }
                    Console.WriteLine();
                }

     

     

    Output looks as follows for the sample you posted:

    Code Snippet

    Processing Worksheet named "Customer Pricing":
    Account Product Price
    A1      P1      5.32
    A2      P2      12.45

    Processing Worksheet named "Customer Discounts":
    Account Discount
    A1      0.1
    A2      0.09

    Processing Worksheet named "Sheet3":

     

     

    Tuesday, May 6, 2008 5:53 PM
  • Thanks Martin!  That's saved me a lot of time.

     

    Griff

    Wednesday, May 7, 2008 8:27 AM
  •  You can also query the data directly from the excel file with the open source Linq to Excel provider.

    Here's an example of how easy the Linq to Excel provider makes it to query data from the "Customer Pricing" worksheet based upon the xml example above.

    IExcelRepository<Customer> repo = new ExcelRepository<Customer>();  
    repo.FileName = "C:\\myFolder\\myFile.xls";  
    repo.WorksheetName = "Customer Pricing";  
     
    //Select all customers in the worksheet  
    var allCustomers = from customer in repo.Worksheet  
                       select customer;  
     
    //Select customers that have prices greater than 10  
    var highPriceCustomers = from customer in repo.Worksheet  
                             where customer.Price > 10  
                             select customer;  
     
    //Customer Class  
    public class Customer  
    {  
      public string Account { get; set; }  
      public string Product { get; set; }  
      public decimal Price { get; set; }  
    }  
     


    Be sure to checkout the project's home page for more examples.

    Thursday, January 8, 2009 2:14 PM