locked
How to make a Complex XML to DataSet/DataTable like Excel when you open a XML file RRS feed

  • Question

  • User-943250815 posted

    If I open any XML file on Excel it does all the job and flat XML with all necessary columns and rows.
    I would like to do same but from code. The idea is generate an Excel file like it does by itself.
    I know I can use System.Data.DataSet.ReadXML(<filename>), and there will be many Tables and Relations.
    What I did not realize is how to deal with in a efficient way.
    All sample I found are for simple XML files, my one just as an example have 37 tables and 36 relations (insane I know).

    So how can I achieve a Flat Dataset, so with it I can produce a Excel file with data from multiple XML files

    Friday, February 15, 2019 12:53 AM

All replies

  • User-1174608757 posted

    Hi jzero,

    According to you description,I have made a sample here. I find that the relationship seems ok  if you use the use System.Data.DataSet.ReadXML( ) and  write the xml file correctly. Here is my xml file.

    <NewDataSet>
      <ParentTable>
        <id>0</id>
        <ParentItem>ParentItem 0</ParentItem>
      </ParentTable>
      <ParentTable>
        <id>1</id>
        <ParentItem>ParentItem 1</ParentItem>
      </ParentTable>
      <ParentTable>
        <id>2</id>
        <ParentItem>ParentItem 2</ParentItem>
      </ParentTable>
      <childTable>
        <ChildID>0</ChildID>
        <ChildItem>Item 0</ChildItem>
        <ParentID>0</ParentID>
      </childTable>
      <childTable>
        <ChildID>1</ChildID>
        <ChildItem>Item 1</ChildItem>
        <ParentID>0</ParentID>
      </childTable>
      <childTable>
        <ChildID>2</ChildID>
        <ChildItem>Item 2</ChildItem>
        <ParentID>0</ParentID>
      </childTable>
      <childTable>
        <ChildID>3</ChildID>
        <ChildItem>Item 3</ChildItem>
        <ParentID>0</ParentID>
      </childTable>
      <childTable>
        <ChildID>4</ChildID>
        <ChildItem>Item 4</ChildItem>
        <ParentID>0</ParentID>
      </childTable>
      <childTable>
        <ChildID>5</ChildID>
        <ChildItem>Item 0</ChildItem>
        <ParentID>1</ParentID>
      </childTable>
      <childTable>
        <ChildID>6</ChildID>
        <ChildItem>Item 1</ChildItem>
        <ParentID>1</ParentID>
      </childTable>
      <childTable>
        <ChildID>7</ChildID>
        <ChildItem>Item 2</ChildItem>
        <ParentID>1</ParentID>
      </childTable>
      <childTable>
        <ChildID>8</ChildID>
        <ChildItem>Item 3</ChildItem>
        <ParentID>1</ParentID>
      </childTable>
      <childTable>
        <ChildID>9</ChildID>
        <ChildItem>Item 4</ChildItem>
        <ParentID>1</ParentID>
      </childTable>
      <childTable>
        <ChildID>10</ChildID>
        <ChildItem>Item 0</ChildItem>
        <ParentID>2</ParentID>
      </childTable>
      <childTable>
        <ChildID>11</ChildID>
        <ChildItem>Item 1</ChildItem>
        <ParentID>2</ParentID>
      </childTable>
      <childTable>
        <ChildID>12</ChildID>
        <ChildItem>Item 2</ChildItem>
        <ParentID>2</ParentID>
      </childTable>
      <childTable>
        <ChildID>13</ChildID>
        <ChildItem>Item 3</ChildItem>
        <ParentID>2</ParentID>
      </childTable>
      <childTable>
        <ChildID>14</ChildID>
        <ChildItem>Item 4</ChildItem>
        <ParentID>2</ParentID>
      </childTable>
    </NewDataSet>

    Are you  writing the  xml  as  this format?In fact , if you use System.Data.DataSet.ReadXML( ) to convert xml file to datatable. It will call a function called .ReadXmlSchema(xmlStream).  The schema includes table, relation, and constraint definitions and it  is generally invoked before invoking the ReadXml method which is used to fill the DataTable.

    Here is the link , I hope it could help you.

    https://docs.microsoft.com/en-us/dotnet/api/system.data.datatable.readxmlschema?view=netframework-4.7.2

    Best Regards

    Wei Zhang

    Monday, February 18, 2019 9:11 AM
  • User-943250815 posted

    I´m already using ReadXML, what I'm looking for is how to flat XML file like Excel does, but without using Microsoft.Office.Interop.Excel
    So how to deal with tables and relations in a efficient way.
    Here a small sample, as you can see it is about 9 levels deep

    <main>
      <order>
        <orderid Id="60032124">
          <base>
            <state>23</state> 
            <ordernumber>06003212</ordernumber> 
            <operation>From Warehouse</operation> 
          </base>
          <seller>
            <code>5396</code> 
            <name>John Doe</name> 
            <sellerData>
              <phone>123-4567</phone> 
              <county>Grand Rapids</county> 
            </sellerData>
          </seller>
          <buyer>
            <id>01775</id> 
            <name>John alive</name> 
            <buyerData>
              <phone>123-4567</phone> 
              <county>Marietta</county> 
            </buyerData>
          </buyer>
          <item id="1">
            <data>
              <code>I17MBS13</code> 
              <barcode /> 
              <description>Elbow 3/4</description> 
              <value>100</value> 
            </data>
            <taxes>
              <totaltaxes>0.00</totaltaxes> 
              <stateTax>
                <classAA>
                  <value>10</value> 
                  <percent>10</percent> 
                </classAA>
              </stateTax>
              <federalTax>
                <classZZ>
                  <value>10</value> 
                  <percent>10</percent> 
                </classZZ>
              </federalTax>
    </taxes> </item>
    <item id="2">
    <data>
    <code>I17MBS14</code>
    <barcode />
    <description>Elbow 1/2</description>
    <value>123</value>
    </data>
    <taxes>
    <totaltaxes>0.00</totaltaxes>
    <stateTax>
    <classAA>
    <value>11</value>
    <percent>11</percent>
    </classAA>
    </stateTax>
    <federalTax>
    <classZZ>
    <value>12</value>
    <percent>12</percent>
    </classZZ>
    </federalTax>
    </taxes>
    </item>
    </orderid> </order> </main>

    Tuesday, February 19, 2019 12:20 AM
  • User-1174608757 posted

    Hi jzero,

    So , how excel table do you want to display? How you want to show the relationship between the each element ?

    Best Regards

    Wei Zhang

    Tuesday, February 19, 2019 10:04 AM
  • User-943250815 posted

    Wei Zhang, I mean flat.
    Copy my XML sample and save in a file with xml extension. Then on Excel open saved file, in Open XML dialog box, check "As an XML table"
    Here you can see a sample image and discussion about https://stackoverflow.com/questions/8422979/how-do-i-load-an-xml-doc-into-excel-using-the-open-xml-sdk-and-net?rq=1

    Tuesday, February 19, 2019 12:01 PM
  • User475983607 posted

    Use LINQ to XML and the SelectMany() extension to flatten related result sets.  Once flattened, export to Excel.

    Tuesday, February 19, 2019 12:13 PM
  • User-943250815 posted

    @mgebhard
    Yeap SelectMany, can help, unfortunately it do not drill down all nodes, it ask for parameters, when I give it root or first child it flattens but only at node level.
    Perhaps loop through Descendants to get tags name to use on SelectMany, but seems a hard work.

    Wednesday, February 20, 2019 12:39 AM
  • User-1174608757 posted

    Hi jzero,

    According to your description, could you please share your xml file? I have seen the link, to get the excel table you want  ,you have to set the structure of  data based on reflection between nodes. So , I hope that you could share the whole xml file. 

    Best Regards

    Wei Zhang

    Wednesday, February 20, 2019 11:12 AM
  • User-943250815 posted

    @Wei Zhang
    You can use my posted sample, the only difference is number of items.

    Wednesday, February 20, 2019 11:59 AM
  • User-943250815 posted

    I'm not discarding SelectMany or ReadXML and deal with Tables & Relations, after some more search I found an article talking about XMLTextReader (https://www.codemag.com/article/0201031/XML-to-Database-using-.NET%27s-XmlTextReader)
    After some work I'm very close to simulate what Excel does. I still have to know in advance if there are repeating items.
    I sure there are better and more elegant ways, for now it solves actual issue. As soon as I can I´ll back to SelectMany & ReadXML
    Any suggestions or comments?
    This code was made to be used with sample XML already posted by me

    Imports System.Xml
    Imports System.Data
    Imports ClosedXML.Excel
    
    Public Class clsPathValue
        Public Property path As String
        Public Property value As String
        Public Sub New(ByVal zPath As String, zValue As String)
          path = zPath
          value = zValue
        End Sub
      End Class
    
      Public Class clsItemDetail
        Public Property item As Integer
        Public Property detail As List(Of clsPathValue)
        Public Sub New(ByVal pItem As Integer, ByVal pDetail As List(Of clsPathValue))
          item = pItem
          detail = pDetail
        End Sub
      End Class
    
      Protected Sub XMLToExcel(ByVal PathOfXmlFile As String)
        Dim zRepeatingTagName As String = "item" ''//Use tag name that represents a repeating item
        Dim zRepeatingTagAttr As String = "id"   ''//Use attribute name used in repeating item
        Dim zItemID As Integer = 0               
        Dim zTagName As String = Nothing
        Dim zTagValue As String = Nothing
        Dim zTagLevel As Integer = 0
        Dim zLstLevelAndName As New List(Of Tuple(Of Integer, String))
        Dim zLstFieldAndValue As New List(Of clsPathValue)
        Dim zLstOfRepeatingItens As New List(Of clsItemDetail)
    
        Try
          Dim reader As New XmlTextReader(PathOfXmlFile)
          While reader.Read
            If Not reader.NodeType = XmlNodeType.XmlDeclaration And Not reader.NodeType = XmlNodeType.Whitespace Then
              If reader.NodeType = XmlNodeType.Element And reader.IsStartElement And Not reader.IsEmptyElement Then
                zTagLevel = reader.Depth
                zTagName = reader.Name
                If reader.Name = zRepeatingTagName Then
                  zItemID = reader.GetAttribute(zRepeatingTagAttr)
                End If
                zLstLevelAndName.Add(New Tuple(Of Integer, String)(zTagLevel, zTagName))
              End If
              If reader.NodeType = XmlNodeType.Text Then
                zTagLevel = reader.Depth
                zTagValue = reader.Value
                Dim parents As String = String.Join(".", (From p In zLstLevelAndName Where p.Item1 < zTagLevel Select p.Item2).ToArray)
                zLstFieldAndValue.Add(New clsPathValue(parents, zTagValue))
              End If
              If reader.NodeType = XmlNodeType.EndElement Then
                If reader.Name = "item" Then
                  Dim k = From p In zLstFieldAndValue Where p.path.Contains(".item.") Select p
                  zLstOfRepeatingItens.Add(New clsItemDetail(zItemID, k.ToList))
                  zLstFieldAndValue.RemoveAll(Function(x) x.path.Contains(".item."))
                End If
                zLstLevelAndName.RemoveAt(zLstLevelAndName.Count - 1)
              End If
            End If
          End While
          Dim zDT As New DataTable
          zDT.Columns.Add(New DataColumn("Item", GetType(String)))
          For Each rec In From p In zLstFieldAndValue
            zDT.Columns.Add(New DataColumn(rec.path, GetType(String)))
          Next
          For Each rec In zLstOfRepeatingItens
            Dim nItem As Integer = rec.item
            For Each campo In rec.detail
              Dim nomeCampo As String = campo.path
              If Not zDT.Columns.Contains(nomeCampo) Then
                zDT.Columns.Add(New DataColumn(nomeCampo, GetType(String)))
              End If
            Next
          Next
          For Each rec In zLstOfRepeatingItens
            Dim zDR As DataRow = zDT.NewRow()
            zDR("Item") = rec.item
            For Each rec1 In From p In zLstFieldAndValue
              zDR(rec1.path) = rec1.value
            Next
    
            For Each recItem In rec.detail
              Dim nomeCampo As String = recItem.path
              zDR(nomeCampo) = recItem.value
            Next
            zDT.Rows.Add(zDR)
          Next
    
          Dim QtdCol As Integer = zDT.Columns.Count
          Using wb As New XLWorkbook
            wb.Worksheets.Add("Orders")
            wb.Worksheets(0).SheetView.FreezeRows(1)
            wb.Worksheets(0).Range(1, 1, 1, QtdCol).Style.Font.Bold = True
            wb.Worksheets(0).Range(1, 1, 1, QtdCol).Style.Fill.BackgroundColor = XLColor.FromArgb(149, 179, 215)
            wb.Worksheets(0).Cell(1, 1).InsertTable(zDT.AsEnumerable)
            wb.Worksheets(0).Tables(0).SetShowAutoFilter(True)
            wb.Worksheets(0).Tables(0).Theme = XLTableTheme.None
            wb.Worksheets(0).Columns.AdjustToContents()
            Response.Buffer = True
            Response.Charset = ""
            Response.ContentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
            Response.AddHeader("content-disposition", "attachment;filename=OrdersFromXML.xlsx")
            Using MyMemoryStream As New System.IO.MemoryStream()
              wb.SaveAs(MyMemoryStream)
              MyMemoryStream.WriteTo(Response.OutputStream)
              Response.Flush()
              Response.End()
            End Using
          End Using
    
        Catch ex As Exception
          Dim FailMsg as string = ex.Message
        End Try

    Wednesday, February 20, 2019 7:49 PM