Answered by:
Using LINQ and an XML spreadsheet

Question
-
I have an Excel workbook with several worksheets in it that has been saved in an XML.
Is there a good "for dummies" tutorial describing how to use LINQ to query the data?
Thanks
Griff
Tuesday, May 6, 2008 4:23 PM
Answers
-
Here is an example that outputs the data in the worksheets:
Code SnippetXDocument workbook = XDocument.Load(@"..\..\XMLFile1.xml");
XNamespace ss = "urn:schemas-microsoft-com:office:spreadsheet";
XNamespace o = "urn:schemas-microsoft-com:office:office";
XNamespace x = "urn:schemas-microsoft-com:office:excel";
foreach (XElement worksheet in workbook.Descendants(ss + "Worksheet"))
{
Console.WriteLine("Processing Worksheet named \"{0}\":", worksheet.Attribute(ss + "Name").Value);
foreach (XElement row in worksheet.Descendants(ss + "Row"))
{
foreach (XElement cell in row.Descendants(ss + "Data"))
{
Console.Write("{0}\t", cell.Value);
}
Console.WriteLine();
}
Console.WriteLine();
}Output looks as follows for the sample you posted:
Code SnippetProcessing Worksheet named "Customer Pricing":
Account Product Price
A1 P1 5.32
A2 P2 12.45Processing Worksheet named "Customer Discounts":
Account Discount
A1 0.1
A2 0.09Processing Worksheet named "Sheet3":
Tuesday, May 6, 2008 5:53 PM
All replies
-
MSDN has a lot on LINQ to XML: http://msdn.microsoft.com/en-us/library/bb387098.aspx
None of that is geared towards processing Office XML formats but those are just XML with a couple of namespaces defined and used so anything that explains LINQ and how to deal with namespaces should provide a start: http://msdn.microsoft.com/en-us/library/bb387093.aspx
Tuesday, May 6, 2008 4:29 PM -
I have the following "trivial" xml example saved from Excel (two worksheets with some dummy data)
Code Snippet
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>********</Author>
<LastAuthor>********</LastAuthor>
<Created>2008-05-06T13:44:18Z</Created>
<LastSaved>2008-05-06T17:12:15Z</LastSaved>
<Company>********</Company>
<Version>11.9999</Version>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>12270</WindowHeight>
<WindowWidth>15195</WindowWidth>
<WindowTopX>480</WindowTopX>
<WindowTopY>60</WindowTopY>
<ActiveSheet>1</ActiveSheet>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21">
<Font x:Family="Swiss" ss:Bold="1"/>
</Style>
<Style ss:ID="s22">
<NumberFormat ss:Format="0%"/>
</Style>
</Styles>
<Worksheet ss:Name="Customer Pricing">
<Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="3" x:FullColumns="1"
x:FullRows="1">
<Column ss:Width="51"/>
<Row>
<Cell ss:StyleID="s21"><Data ss:Type="String">Account</Data></Cell>
<Cell ss:StyleID="s21"><Data ss:Type="String">Product</Data></Cell>
<Cell ss:StyleID="s21"><Data ss:Type="String">Price</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">A1</Data></Cell>
<Cell><Data ss:Type="String">P1</Data></Cell>
<Cell><Data ss:Type="Number">5.32</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">A2</Data></Cell>
<Cell><Data ss:Type="String">P2</Data></Cell>
<Cell><Data ss:Type="Number">12.45</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Print>
<ValidPrinterInfo/>
<PaperSizeIndex>9</PaperSizeIndex>
<HorizontalResolution>600</HorizontalResolution>
<VerticalResolution>600</VerticalResolution>
</Print>
<Panes>
<Pane>
<Number>3</Number>
<RangeSelection>R1C1:R3C1</RangeSelection>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
<Worksheet ss:Name="Customer Discounts">
<Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="3" x:FullColumns="1"
x:FullRows="1">
<Row>
<Cell ss:StyleID="s21"><Data ss:Type="String">Account</Data></Cell>
<Cell ss:StyleID="s21"><Data ss:Type="String">Discount</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">A1</Data></Cell>
<Cell ss:StyleID="s22"><Data ss:Type="Number">0.1</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="String">A2</Data></Cell>
<Cell ss:StyleID="s22"><Data ss:Type="Number">0.09</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Print>
<ValidPrinterInfo/>
<PaperSizeIndex>9</PaperSizeIndex>
<HorizontalResolution>600</HorizontalResolution>
<VerticalResolution>600</VerticalResolution>
</Print>
<Selected/>
<Panes>
<Pane>
<Number>3</Number>
<ActiveRow>4</ActiveRow>
<ActiveCol>3</ActiveCol>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
<Worksheet ss:Name="Sheet3">
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>The following LINQ query attempts to get each workbook
Code Snippetvar worksheets = from element in xmlInput.Descendants(ss + "Worksheet")
select element;
{
}
When I inspect each "worksheet" I see loads of new namespaces (P...).
Code Snippet+ worksheet <Worksheet p1:Name="Customer Pricing" xmlns:p1="urn:schemas-microsoft-com:office:spreadsheet" xmlns="urn:schemas-microsoft-com:office:spreadsheet">
<Table p3:ExpandedColumnCount="3" p3:ExpandedRowCount="3" x:FullColumns="1" x:FullRows="1" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:p3="urn:schemas-microsoft-com:office:spreadsheet">
<Column p6:Width="51" xmlns:p6="urn:schemas-microsoft-com:office:spreadsheet" />
<Row>
<Cell p7:StyleID="s21" xmlns:p7="urn:schemas-microsoft-com:office:spreadsheet">
<Data p9:Type="String" xmlns:p9="urn:schemas-microsoft-com:office:spreadsheet">Account</Data>
</Cell>
<Cell p7:StyleID="s21" xmlns:p7="urn:schemas-microsoft-com:office:spreadsheet">
<Data p9:Type="String" xmlns:p9="urn:schemas-microsoft-com:office:spreadsheet">Product</Data>
</Cell>
<Cell p7:StyleID="s21" xmlns:p7="urn:schemas-microsoft-com:office:spreadsheet">
<Data p9:Type="String" xmlns:p9="urn:schemas-microsoft-com:office:spreadsheet">Price</Data>
</Cell>
</Row>
<Row>
<Cell>
<Data p8:Type="String" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">A1</Data>
</Cell>
<Cell>
<Data p8:Type="String" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">P1</Data>
</Cell>
<Cell>
<Data p8:Type="Number" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">5.32</Data>
</Cell>
</Row>
<Row>
<Cell>
<Data p8:Type="String" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">A2</Data>
</Cell>
<Cell>
<Data p8:Type="String" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">P2</Data>
</Cell>
<Cell>
<Data p8:Type="Number" xmlns:p8="urn:schemas-microsoft-com:office:spreadsheet">12.45</Data>
</Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Print>
<ValidPrinterInfo />
<PaperSizeIndex>9</PaperSizeIndex>
<HorizontalResolution>600</HorizontalResolution>
<VerticalResolution>600</VerticalResolution>
</Print>
<Panes>
<Pane>
<Number>3</Number>
<RangeSelection>R1C1:R3C1</RangeSelection>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet> System.Xml.Linq.XElement
Not sure how best to read the data I actually want from this.
Griff
Tuesday, May 6, 2008 5:33 PM -
Can you describe the data you are looking for? Then we can help write a query.
Tuesday, May 6, 2008 5:40 PM -
Here is an example that outputs the data in the worksheets:
Code SnippetXDocument workbook = XDocument.Load(@"..\..\XMLFile1.xml");
XNamespace ss = "urn:schemas-microsoft-com:office:spreadsheet";
XNamespace o = "urn:schemas-microsoft-com:office:office";
XNamespace x = "urn:schemas-microsoft-com:office:excel";
foreach (XElement worksheet in workbook.Descendants(ss + "Worksheet"))
{
Console.WriteLine("Processing Worksheet named \"{0}\":", worksheet.Attribute(ss + "Name").Value);
foreach (XElement row in worksheet.Descendants(ss + "Row"))
{
foreach (XElement cell in row.Descendants(ss + "Data"))
{
Console.Write("{0}\t", cell.Value);
}
Console.WriteLine();
}
Console.WriteLine();
}Output looks as follows for the sample you posted:
Code SnippetProcessing Worksheet named "Customer Pricing":
Account Product Price
A1 P1 5.32
A2 P2 12.45Processing Worksheet named "Customer Discounts":
Account Discount
A1 0.1
A2 0.09Processing Worksheet named "Sheet3":
Tuesday, May 6, 2008 5:53 PM -
Thanks Martin! That's saved me a lot of time.
Griff
Wednesday, May 7, 2008 8:27 AM -
You can also query the data directly from the excel file with the open source Linq to Excel provider.
Here's an example of how easy the Linq to Excel provider makes it to query data from the "Customer Pricing" worksheet based upon the xml example above.IExcelRepository<Customer> repo = new ExcelRepository<Customer>(); repo.FileName = "C:\\myFolder\\myFile.xls"; repo.WorksheetName = "Customer Pricing"; //Select all customers in the worksheet var allCustomers = from customer in repo.Worksheet select customer; //Select customers that have prices greater than 10 var highPriceCustomers = from customer in repo.Worksheet where customer.Price > 10 select customer; //Customer Class public class Customer { public string Account { get; set; } public string Product { get; set; } public decimal Price { get; set; } }
Be sure to checkout the project's home page for more examples.Thursday, January 8, 2009 2:14 PM