none
C# Updating multiple xml files using Parallel.ForEach RRS feed

  • Question

  • Now i am updating large multiple files in foreach loop which is taking time. so curious to know can i use Parallel.ForEach to update multiple large xml files simultaneously but do not want to use Lock()

    I am new in Parallel.ForEach so scared for race condition and improper updation of xml file. data should not overlap in files. here is one sample code. so please guys see my code and tell me does my code work fine in production ?

        List<string> listoffiles = new List<string>();
        listoffiles.Add(@"d:\test1.xml");
        listoffiles.Add(@"d:\test2.xml");
        listoffiles.Add(@"d:\test3.xml");
    
        var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 10 };
        Parallel.ForEach(listoffiles, options, (filepath) =>
        {
            XDocument xmlDoc = XDocument.Load(filepath);
    
    
                //query 10QK xml data based on section & lineitem
                var items = (from item in xmlDoc.Descendants("TickerBrokerStandardDateLineitemValue")
                             where item.Element("TabName").Value.Trim() == data.Section
                             && item.Element("StandardLineItem").Value.Trim() == data.LineItem
                             select item).ToList();
    
    
                foreach (var item in items)
                {
                    //element will be inserted or updated in xml when match found
                    if (item.Element("SectionID") == null)
                    {
                        //if SectionID element does not exist then it will be added in xml having ID 
                        item.Add(new XElement("SectionID", data.SectionID));
                    }
                    else
                    {
                        //if SectionID element exist then it will be updated with db value
                        item.Element("SectionID").SetValue(data.SectionID);
                    }
    
                    //element will be inserted or updated in xml when match found
                    if (item.Element("LineItemID") == null)
                    {
                        //if LineItemID element does not exist then it will be added in xml having ID 
                        item.Add(new XElement("LineItemID", data.LineItemID));
                    }
                    else
                    {
                        //if LineItemID element exist then it will be updated with db value
                        item.Element("LineItemID").SetValue(data.LineItemID);
                    }
    
                    if (data.Approved == 1)
                    {
                        //if approved then xfundcode will be updated
                        if (item.Element("XFundCode") != null)
                        {
                            //if XFundCode element exist then it will be updated with db value
                            item.Element("XFundCode").SetValue(data.ApprovedXFundCode);
                        }
                    }
                    else if (data.Approved == 0)
                    {
                        //if unapproved then xfundcode will be empty
                        if (item.Element("XFundCode") != null)
                        {
                            //if XFundCode element exist then it will be updated with db value
                            item.Element("XFundCode").SetValue(string.Empty);
                        }
                    }
                }
    
                xmlDoc.Save(filepath);
        });

    please guide me with best approach which i can use to update multiple large xml files with in short times. thanks

    Sunday, December 8, 2019 6:07 PM

Answers

  • Hi Sudip_inn,

    Thank you for posting here.

    For your question, you want to update multiple xml files by using Parallel.ForEach.

    I did some testing and did not encounter the situation you are worried about.

    I think the designer should fully consider this situation.

    XDocument will load the entire xml file into memory, so if your xml file is too large, you may need Perform Streaming Transform of Large XML Documents.

    Hope this could be helpful.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Monday, December 9, 2019 6:26 AM
  • Your Parallel.ForEach should be OK. No concurrency problems, given that each thread is going to process a different file and there are no variables or other data accessed by multiple threads.

    However, as to whether this will help process the files quickly, it is dubious. You should do some benchmarking to find out if the bottleneck is the CPU (data processing in memory) or the Disk (reading and writing the files). If the bottleneck is the CPU and you have several cores then yes, processing several files in parallel will reduce the total time required. But if the bottleneck is in the disk, then the parallel version is likely to be slower because the disk will be thrashing, jumping from one file to another.

    So it is important to know what is happening in your program and where is your bottleneck. If you just parallelize it blindly, without understanding what is happening, you run the risk of making it slower instead of faster.


    Monday, December 9, 2019 9:34 PM
    Moderator

All replies

  • Hi Sudip_inn,

    Thank you for posting here.

    For your question, you want to update multiple xml files by using Parallel.ForEach.

    I did some testing and did not encounter the situation you are worried about.

    I think the designer should fully consider this situation.

    XDocument will load the entire xml file into memory, so if your xml file is too large, you may need Perform Streaming Transform of Large XML Documents.

    Hope this could be helpful.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Monday, December 9, 2019 6:26 AM
  • How to update xml file using XmlReader.Create() ?

    anyway my above code and approach will work fine when using parallel.foreach ?

    • Edited by Sudip_inn Monday, December 9, 2019 6:38 PM
    Monday, December 9, 2019 6:37 PM
  • Are you calling this once for each record in a data table?  If you're truly worried about performance, then you need to think about this in a different way.  For each record, you open each XML file, read and translate that XML file into data structures, modify the data structures, then translate the data structures back out to text file.

    It would be much more efficient to do this the other way around.  For each file, open the file, then run through your list of data records and make the modifications.  Running through your data table multiple times is going to be WAY faster than translating all those text files time after time after time.


    Tim Roberts | Driver MVP Emeritus | Providenza &amp; Boekelheide, Inc.

    Monday, December 9, 2019 8:31 PM
  • Your Parallel.ForEach should be OK. No concurrency problems, given that each thread is going to process a different file and there are no variables or other data accessed by multiple threads.

    However, as to whether this will help process the files quickly, it is dubious. You should do some benchmarking to find out if the bottleneck is the CPU (data processing in memory) or the Disk (reading and writing the files). If the bottleneck is the CPU and you have several cores then yes, processing several files in parallel will reduce the total time required. But if the bottleneck is in the disk, then the parallel version is likely to be slower because the disk will be thrashing, jumping from one file to another.

    So it is important to know what is happening in your program and where is your bottleneck. If you just parallelize it blindly, without understanding what is happening, you run the risk of making it slower instead of faster.


    Monday, December 9, 2019 9:34 PM
    Moderator
  • @Tim what you said not very clear to me that what to do or how to wrote code that way. so my request sir can you please post small sample code to do the same kind of job from where i can get the idea to develop routine to update multiple xml files simultaneously. thanks
    Tuesday, December 10, 2019 8:04 AM