locked
How to find and remove duplicate data from xml file RRS feed

  • Question

  • My xml looks like

    <?xml version="1.0" encoding="utf-8" standalone="yes"?>
    <TickerBrokerDateMap>
    <Broker>
      <TickerBrokerDateFormatMap BrokerTab_Id="3" Broker_Id="2" Ticker_Id="MGP">
        <StandardDate>1Q 2019A</StandardDate>
        <ColumnCoordinate>BZ</ColumnCoordinate>
        <TickerBrokerDateFormatMaps_Id>8</TickerBrokerDateFormatMaps_Id>
        <BrokerDate StandardDate="1Q 2019A" Broker_Id="2" BrokerTab_Id="3">
          <year>1Q19</year>
          <Quater>1Q19</Quater>
        </BrokerDate>
      </TickerBrokerDateFormatMap>
      <TickerBrokerDateFormatMap BrokerTab_Id="3" Broker_Id="2" Ticker_Id="MGP">
        <StandardDate>2Q 2019A</StandardDate>
        <ColumnCoordinate>CA</ColumnCoordinate>
        <TickerBrokerDateFormatMaps_Id>8</TickerBrokerDateFormatMaps_Id>
        <BrokerDate StandardDate="2Q 2019A" Broker_Id="2" BrokerTab_Id="3">
          <year>2Q19</year>
          <Quater>2Q19</Quater>
        </BrokerDate>
      </TickerBrokerDateFormatMap>
      <TickerBrokerDateFormatMap BrokerTab_Id="3" Broker_Id="2" Ticker_Id="MGP">
        <StandardDate>3Q 2019A</StandardDate>
        <ColumnCoordinate>CB</ColumnCoordinate>
        <TickerBrokerDateFormatMaps_Id>8</TickerBrokerDateFormatMaps_Id>
        <BrokerDate StandardDate="3Q 2019A" Broker_Id="2" BrokerTab_Id="3">
          <year>3Q19</year>
          <Quater>3Q19</Quater>
        </BrokerDate>
      </TickerBrokerDateFormatMap>
      <TickerBrokerDateFormatMap BrokerTab_Id="3" Broker_Id="2" Ticker_Id="MGP">
        <StandardDate>4Q 2019A</StandardDate>
        <ColumnCoordinate>CC</ColumnCoordinate>
        <TickerBrokerDateFormatMaps_Id>8</TickerBrokerDateFormatMaps_Id>
        <BrokerDate StandardDate="4Q 2019A" Broker_Id="2" BrokerTab_Id="3">
          <year>4Q19</year>
          <Quater>4Q19</Quater>
        </BrokerDate>
      </TickerBrokerDateFormatMap>
      </Broker>
      </TickerBrokerDateMap>

    now some time there is duplicate data in TickerBrokerDateFormatMap. duplicate will be based on StandardDate, BrokerTab_Id and Broker_Id

    if there are multiples data having same StandardDate, BrokerTab_Id and Broker_Id then that will be deleted from xml file.

    This way i try to find duplicate data first which not perfect i guess. so please see my code and help me to complete my objective

    my code

    XDocument xmlDocTargetFile = XDocument.Load(strADMFilePath);
    
                           var stGroup = xmlDocTargetFile.Descendants("TickerBrokerDateFormatMap").GroupBy(row =>
                             new 
                             { 
                                 s = row.Element("StandardDate").Value, 
                                 b = row.Attribute("Broker_Id").Value, 
                                 bt = row.Attribute("BrokerTab_Id").Value
                             },
                             (key, gr) => new { key, list = gr }
                             );
    
                           foreach (var item in stGroup)
                           {
                               _lstDuplicates.Add(new TickerBrokerDateFormatMap
                               {
                                   StandardDate = item.key.s,
                                   BrokerTab_Id = item.key.bt,
                                   Broker_Id = item.key.b
                               });
                           }
                           dgList.DataSource = _lstDuplicates;

    _lstDuplicates suppose to store all duplicate data based on StandardDate, BrokerTab_Id and Broker_Id

    if data were stored in db then i used to query this way

    select StandardDate,Broker_Id,BrokerTab_Id,count(*)
    from tmp_TickerBrokerDateFormatMap
    group by StandardDate,Broker_Id,BrokerTab_Id
    having count(*)>1
    order by Broker_Id,BrokerTab_Id,StandardDate

    so tell me how could i use LINQ to query my xml file to find duplicate data based on StandardDate, BrokerTab_Id and Broker_Id

    and remove those duplicate data?

    please help me with code sample. thanks

    Monday, May 11, 2020 10:20 AM

Answers

  • i have done this job this way. full code

    strADMFilePath=admfiles[0];
    XDocument xmlDocTargetFile = XDocument.Load(strADMFilePath);
    
    //Query TickerBrokerDateFormatMap data to get duplicate data based on StandardDate,Broker_Id and BrokerTab_Id
    var stGroup = xmlDocTargetFile.Descendants("TickerBrokerDateFormatMap").GroupBy(row =>
    						new
    						{
    							StandardDate = row.Element("StandardDate").Value,
    							Broker_Id = row.Attribute("Broker_Id").Value,
    							BrokerTab_Id = row.Attribute("BrokerTab_Id").Value
    						})
    						.Where(grp => grp.Count() > 1)
    						.Select(grp => grp.Key);
    
    //if duplicate data found then populate in list
    foreach (var item in stGroup)
    {
       _lstDuplicates.Add(new TickerBrokerDateFormatMap
       {
    	   StandardDate = item.StandardDate,
    	   BrokerTab_Id = item.BrokerTab_Id,
    	   Broker_Id = item.Broker_Id
       });
    }
    xmlDocTargetFile = null;
    
    if (_lstDuplicates != null && _lstDuplicates.Count > 0)
    {
       xmlDocTargetFile = XDocument.Load(strADMFilePath);
    
       //removing duplicate data
       xmlDocTargetFile.Descendants("TickerBrokerDateFormatMap").GroupBy(row =>
       new
       {
    	   StandardDate = row.Element("StandardDate").Value,
    	   Broker_Id = row.Attribute("Broker_Id").Value,
    	   BrokerTab_Id = row.Attribute("BrokerTab_Id").Value
       })
       .SelectMany(m => m.Skip(1)).Remove();
    
       xmlDocTargetFile.Save(strADMFilePath);
       MessageBox.Show("Repair done");
    }

    i read code from these url to get idea.

    https://blogs.msmvps.com/martin-honnen/2009/11/27/grouping-with-linq-to-xml/
    https://www.codeproject.com/Questions/767243/How-to-remove-duplicate-data-from-xml-using-Csharp
    https://stackoverflow.com/questions/25812456/efficiently-removing-duplicate-xml-elements-in-c-sharp
    https://forums.asp.net/t/1406394.aspx?XML+Group+by+with+Linq
    https://stackoverflow.com/questions/2078736/linq-with-group-by-having-count
    https://stackoverflow.com/questions/17318811/how-do-delete-data-using-xdocument

    Thanks


    • Marked as answer by Sudip_inn Monday, May 11, 2020 11:30 AM
    • Edited by Sudip_inn Monday, May 11, 2020 11:33 AM
    Monday, May 11, 2020 11:29 AM