none
I need help with complex LINQ to XML query RRS feed

  • Question

  • I don’t know if the query I am trying to do is even possible but if one of you LINQ to SQL/XML guru’s can figure this out I will be so thankful and salute you as a LINQ God. My end goal is to identify all of the XML Models that are duplicates and show the CECID for all the duplicates except one. So lets say I have an Xdocument that looks like this:

    <ApplianceModels xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" ApplianceType="IceMakers">

      <Model>

        <ReferenceNumber>201877149</ReferenceNumber>

        <Action>C</Action>

        <Brand>4564</Brand>

        <ModelNumber>1234212</ModelNumber>

        <EquipmentType>A</EquipmentType>

        <CoolingType>W</CoolingType>

        <IceType>C</IceType>

        <IceMakerProcessType>B</IceMakerProcessType>

        <TestLabCode>ARN3190</TestLabCode>

        <ManufacturerCode>ARN2396</ManufacturerCode>

        <HarvestRateLbs24Hr>56</HarvestRateLbs24Hr>

        <EnergyCons_kWhPer100Lbs>4.00</EnergyCons_kWhPer100Lbs>

        <WaterCons_galPer100Lbs>12</WaterCons_galPer100Lbs>

        <IceHardnessAdjustmentFactor xsi:nil="true" />

        <RegulatoryStatus>I</RegulatoryStatus>

        <CECID>d579ae7a-f3f7-4627-a3f1-f17b23aa28e3</CECID>

      </Model>

      <Model>

        <ReferenceNumber>201877143</ReferenceNumber>

        <Action>C</Action>

        <Brand>4564</Brand>

        <ModelNumber>12342</ModelNumber>

        <EquipmentType>A</EquipmentType>

        <CoolingType>W</CoolingType>

        <IceType>C</IceType>

        <IceMakerProcessType>B</IceMakerProcessType>

        <TestLabCode>ARN3190</TestLabCode>

        <ManufacturerCode>ARN2396</ManufacturerCode>

        <HarvestRateLbs24Hr>56</HarvestRateLbs24Hr>

        <EnergyCons_kWhPer100Lbs>4.00</EnergyCons_kWhPer100Lbs>

        <WaterCons_galPer100Lbs>12</WaterCons_galPer100Lbs>

        <IceHardnessAdjustmentFactor xsi:nil="true" />

        <RegulatoryStatus>I</RegulatoryStatus>

        <CECID>94c6d6e6-5b6a-4f45-a7ff-70a64e50e4e6</CECID>

      </Model>

      <Model>

        <ReferenceNumber>201877152</ReferenceNumber>

        <Action>C</Action>

        <Brand>4564</Brand>

        <ModelNumber>1231114234</ModelNumber>

        <EquipmentType>A</EquipmentType>

        <CoolingType>W</CoolingType>

        <IceType>C</IceType>

        <IceMakerProcessType>C</IceMakerProcessType>

        <TestLabCode>ARN3190</TestLabCode>

        <ManufacturerCode>ARN2396</ManufacturerCode>

        <HarvestRateLbs24Hr>81</HarvestRateLbs24Hr>

        <EnergyCons_kWhPer100Lbs>1.10</EnergyCons_kWhPer100Lbs>

        <WaterCons_galPer100Lbs>12</WaterCons_galPer100Lbs>

        <IceHardnessAdjustmentFactor>4.45</IceHardnessAdjustmentFactor>

        <RegulatoryStatus>I</RegulatoryStatus>

        <CECID>d97a603c-1836-43a3-b564-ab8d1bdec65f</CECID>

      </Model> 

    </ApplianceModels>

    Then in SQL Server I have a table called tApplianceTypeColumns that looks like this for a given appliance type:

    ApplianceTypeID

    ApplianceColumnUnique

    ApplianceColumnName

    10

    0

    ReferenceNumber

    10

    1

    Brand

    10

    1

    ModelNumber

    10

    0

    EquipmentType

    10

    0

    CoolingType

    10

    0

    IceType

    10

    0

    IceMakerProcessType

    10

    0

    HarvestRateLbs24Hr

    10

    0

    EnergyCons_kWhPer100Lbs

    10

    0

    WaterCons_galPer100lbs

    10

    1

    RegulatoryStatus

    So here is what I started with but I am far from being close:

    var DupeItems = from m in doc.Descendants("Model").Elements()

                    join at in entities.tApplianceTypeColumns on m.Name equals at.ApplianceColumnName

                    group m by m.Element(at.ApplianceColumnName).Value into d

                    where at.ApplianceTypeID == ApplianceTypeID

    So really I want to be able to group by Brand, Model Number, and RegulatoryStatus which are the columns in the tApplianceTypeColumns table that have the ApplianceColumnUnique bit column set to true. The number of true bits could vary depending on the ApplianceTypeID I am looking up in that table.

    Additionally, I also need to include two elements in the grouping that are not in the tApplianceTypeColumns table always and those elements are Action then ManufacturerCode followed by all the other unique elements from the tApplianceTypeColumns in no specific order.

    The ApplianceTypeID is a known parameter that will be passed to the query. So for any set of duplicates I need to display the CECID for the 2nd and subsequent duplicates so that I can take those CECID’s and do lookups in other tables to change their status. But this first step is tough. I don’t care which of the duplicates does not get displayed. I just need to display all others except 1. I hope I have explained this well enough.


    • Edited by NEW2BI Tuesday, May 14, 2019 7:57 PM Formatting
    Tuesday, May 14, 2019 4:54 PM

Answers

  • I was able to get a solution to this on StackOverflow.com. Here is my final code for this:

    private List<string> XMLDuplicatesToEliminate(XDocument doc, Guid ApplianceTypeID) { var entities = new DbContextFactory().MAEDBSEntities; var applianceModels = doc.Descendants("Model"); var applianceTypeColumns = (from at in entities.tApplianceTypeColumns where at.ApplianceTypeID == ApplianceTypeID && at.ApplianceColumnUnique == true select new { at.ApplianceColumnName }).ToList(); var uniqueColumns = Enumerable.Concat( "Action,ManufacturerCode".Split(','), applianceTypeColumns .Select(at => at.ApplianceColumnName) ); List<string> DuplicatesToEliminate = new List<string>(); var duplicates = (applianceModels .GroupBy( model => uniqueColumns.Select(columnName => model.Element(columnName)?.Value).ToArray(), new LambdaComparer<string[]>((a, b) => a.SequenceEqual(b), x => x.Aggregate(13, (hash, y) => hash * 7 + y?.GetHashCode() ?? 0)) ) .Where(x => x.Count() > 1) .Select(g => new { g.Key, Duplicates = g.Select(x => x.Element("CECID")?.Value) }) ).ToList(); foreach (var duperow in duplicates) { string firstdupe = duperow.Duplicates.First(); IEnumerable<string> allbutone = duperow.Duplicates.Where(x => x != firstdupe); foreach (string dupeitem in allbutone) { DuplicatesToEliminate.Add(dupeitem); } } return DuplicatesToEliminate; }

    public class LambdaComparer<T> : IEqualityComparer<T>
    {
        private readonly Func<T, T, bool> equals;
        private readonly Func<T, int> getHashCode;
    
        public LambdaComparer(Func<T, T, bool> equals, Func<T, int> getHashCode)
        {
            this.equals = equals;
            this.getHashCode = getHashCode;
        }
    
        public bool Equals(T x, T y) => equals(x, y);
        public int GetHashCode(T obj) => getHashCode(obj);
    }




    • Marked as answer by NEW2BI Thursday, May 16, 2019 1:19 AM
    • Edited by NEW2BI Thursday, May 16, 2019 1:21 AM Eliminated line
    Wednesday, May 15, 2019 5:16 PM

All replies

  • Hi NEW2BI,

    Sorry, I could not fully understand the relationship between the table and the xml. Could you explain it in detail?

    As to get the "CECID" in xml via LINQ, you can refer to the following code, and use "Distinct" to remove duplicates.

        XDocument doc = XDocument.Load(@"D:\ApplianceModels.xml");
        var query = from p in doc.Element("ApplianceModels").Elements("Model")
                    select (string)p.Element("CECID");
        foreach(var item in query.Distinct())
        {
            Console.WriteLine(item);
        }
        Console.ReadKey();

    Regards,

    Kyle


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, May 15, 2019 6:39 AM
    Moderator
  • I was able to get a solution to this on StackOverflow.com. Here is my final code for this:

    private List<string> XMLDuplicatesToEliminate(XDocument doc, Guid ApplianceTypeID) { var entities = new DbContextFactory().MAEDBSEntities; var applianceModels = doc.Descendants("Model"); var applianceTypeColumns = (from at in entities.tApplianceTypeColumns where at.ApplianceTypeID == ApplianceTypeID && at.ApplianceColumnUnique == true select new { at.ApplianceColumnName }).ToList(); var uniqueColumns = Enumerable.Concat( "Action,ManufacturerCode".Split(','), applianceTypeColumns .Select(at => at.ApplianceColumnName) ); List<string> DuplicatesToEliminate = new List<string>(); var duplicates = (applianceModels .GroupBy( model => uniqueColumns.Select(columnName => model.Element(columnName)?.Value).ToArray(), new LambdaComparer<string[]>((a, b) => a.SequenceEqual(b), x => x.Aggregate(13, (hash, y) => hash * 7 + y?.GetHashCode() ?? 0)) ) .Where(x => x.Count() > 1) .Select(g => new { g.Key, Duplicates = g.Select(x => x.Element("CECID")?.Value) }) ).ToList(); foreach (var duperow in duplicates) { string firstdupe = duperow.Duplicates.First(); IEnumerable<string> allbutone = duperow.Duplicates.Where(x => x != firstdupe); foreach (string dupeitem in allbutone) { DuplicatesToEliminate.Add(dupeitem); } } return DuplicatesToEliminate; }

    public class LambdaComparer<T> : IEqualityComparer<T>
    {
        private readonly Func<T, T, bool> equals;
        private readonly Func<T, int> getHashCode;
    
        public LambdaComparer(Func<T, T, bool> equals, Func<T, int> getHashCode)
        {
            this.equals = equals;
            this.getHashCode = getHashCode;
        }
    
        public bool Equals(T x, T y) => equals(x, y);
        public int GetHashCode(T obj) => getHashCode(obj);
    }




    • Marked as answer by NEW2BI Thursday, May 16, 2019 1:19 AM
    • Edited by NEW2BI Thursday, May 16, 2019 1:21 AM Eliminated line
    Wednesday, May 15, 2019 5:16 PM
  • Hi NEW2BI,

    I am glad you have got your solution. We appreciated you shared us your solution. And we also hope you can mark it as an answer. By marking a post as Answered, you help others find the answer faster.

    Regards,

    Kyle


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, May 16, 2019 1:16 AM
    Moderator