Most efficient way of removing blank lines in a xml file? RRS feed

  • Question

  • I have a very basic duplicate node removing program whose code is given below

    XDocument xdoc=XDocument.Load(@"D:\test\12345.XML",LoadOptions.PreserveWhitespace);
    				.GroupBy(g => (string)g.Value.ToLower())
    				.Where(g => g.Count() > 1)
    				.SelectMany(g => g.Skip(1))

    I want to remove all blank lines in the file(if any) after this process.

    So, I tried a few ways of doing it(given below) but not sure which one of them is faster and more efficient?


    var subjectString = File.ReadAllText(@"D:\test\12345.XML");
    var resultString = Regex.Replace(subjectString, @"[\r\n]*^\s*$[\r\n]*", "", RegexOptions.Multiline);


    File.WriteAllLines(@"D:\test\12345.XML", File.ReadAllLines(@"D:\test\12345.XML")
    .Where(l => !string.IsNullOrWhiteSpace(l)));

    and 3)

    var tempFileName = Path.GetTempFileName();
    	using (var streamReader = new StreamReader(@"D:\test\12345.XML"))
    		using (var streamWriter = new StreamWriter(tempFileName))
    		string line;
    		while ((line = streamReader.ReadLine()) != null)
    			if (!string.IsNullOrWhiteSpace(line))
    	File.Copy(tempFileName, @"D:\test\12345.XML", true);

    Can anyone help me on this.

    Also, if anyone has any other process of achieving this please show me...

    NOTE: I want to do this process on multiple files, so the faster the program is the better.
    • Edited by Bumba_007 Sunday, February 4, 2018 10:39 AM
    Sunday, February 4, 2018 10:35 AM

All replies

  • If the files are large, probably the best way is number 3, since it doesn't require loading the whole content of the file in memory. But I suggest an improvement: instead of allocating a temporary file in the temp folder, assign a file in the same folder where you have the original file. Then, instead of copying the new file over the original, just delete the original and rename the new file as the original. This saves performing on disk one copy of the whole content of the file.
    • Proposed as answer by Fei Hu Wednesday, February 7, 2018 8:50 AM
    Sunday, February 4, 2018 10:52 AM
  • I was just wondering, isn't there any method in XDocument or xmlDocument to remove any blank lines after operation?

    Why do you have to load the document again in a different method/stream to do this in the first place?

    A valid xml file should not have any blank lines correct??

    • Edited by Don Bradman Sunday, February 4, 2018 12:16 PM
    Sunday, February 4, 2018 12:15 PM
  • hmm, without testing, shouldn't the SaveOption not being SaveOption.None? DisableFormatting keeps existing, but unnecessary whitespaces.
    Sunday, February 4, 2018 1:09 PM
  • Hello Bumba_007,

    >>I want to do this process on multiple files, so the faster the program is the better.

    The file operation need to invoke system kernel's OS api in underlying mechanism. The OS, file size, hard disk ,etc all have a little impact on the speed of running procedures. I can't response in a positive ways. But you could tested the speed based on a certain machine. Try to run the above method and calculate the speed.

     Stopwatch sw = new Stopwatch();
     //[your test code]
     TimeSpan ts2 = sw.Elapsed;
     Console.WriteLine("Spend time: {0}ms.", ts2.TotalMilliseconds);

    Hope this would be helpful.

    Best regards,

    Neil Hu

    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, February 5, 2018 6:11 AM