none
Console Application Performance: How to close handles after creating files RRS feed

  • Question

  • This is in connection to this thread:  https://social.msdn.microsoft.com/Forums/vstudio/en-US/1968f346-7374-4e75-9357-f318ea082d39/how-to-improve-console-application-performance?forum=vbgeneral

    I noticed from the Resource Monitor in the "Associated Handles" part that the app is creating "kernel objects\maximum commit condition" and "kernel objects\low memory condition" although no exception is thrown.

    I'm wondering if the File.WriteAllLines command is creating these and whether the files stay in memory after this call.  If they're staying in memory, how do I dispose them (or close the handles if that's what it is)?  I'm wondering if this is causing the max commit/low memory conditions.

    Appreciate any help.


    Marilyn Gambone

    Tuesday, May 1, 2018 12:05 PM

Answers

  • OK, then I doubt that the issue is related to IO.File.WriteAllLines (at least not directly).

    At this point you should probably run a performance profile and do a memory capture to see exactly what is consuming the most memory.

    https://docs.microsoft.com/en-us/visualstudio/profiling/beginners-guide-to-performance-profiling


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    • Marked as answer by deskcheck1 Tuesday, May 1, 2018 3:32 PM
    Tuesday, May 1, 2018 3:08 PM
    Moderator
  • Make a copy and start removing code until the problem stops. Then the last thing you removed points to where it is.

    Its not just normal garbage collection and dumping?

    How long is it taking to run now? Last I noticed you said took 6 hrs I think for a run. What is the current time? Just curious.  :)

    • Marked as answer by deskcheck1 Tuesday, May 1, 2018 3:35 PM
    Tuesday, May 1, 2018 3:18 PM
  • Whenever you get a chance let us know if the speed increased.

    Anyhow I did a test with a loop and IO.File.WriteAllLines that output an 89.6 MB file. Code and result follow. No noticeable change in memory occured so I didn't screen shoot it.

    Option Strict On
    
    Public Class Form1
    
        Dim SW As New Stopwatch
    
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            Me.Location = New Point(CInt((Screen.PrimaryScreen.WorkingArea.Width / 2) - (Me.Width / 2)), CInt((Screen.PrimaryScreen.WorkingArea.Height / 2) - (Me.Height / 2)))
        End Sub
    
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            Label2.Text = "Waiting"
            Label2.Refresh()
            Label4.Text = "Waiting"
            Label4.Refresh()
            Dim s As String = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890!@#$%^*()_+-={}[]\|;':,',./<>?"
            Dim StrList As New List(Of String)
            SW.Restart()
            Label2.Text = SW.Elapsed.ToString
            Label2.Refresh()
            For i = 1 To 1000000
                StrList.Add(s)
            Next
            IO.File.WriteAllLines("C:\Users\John\Desktop\MegaText.Txt", StrList.ToArray)
            Label4.Text = SW.Elapsed.ToString
            SW.Stop()
        End Sub
    
    End Class


    La vida loca

    • Marked as answer by deskcheck1 Wednesday, May 2, 2018 12:38 PM
    Tuesday, May 1, 2018 7:25 PM
  • Mari,

    You have not said if you did much with string optimizing.

    When you do your run you should set up timers to know at the end how much time you spend reading files, writing files, and calculating (or maybe the performance analyzer gives it).

    Then you should try to remove all string functions if your calculating time is significant compared to your read/write times. You should try to use only double variable types (instead of single or string). Even convert your source_stations string array before the run.

    Lines like this:


             sline = String.Format("{0},{1},{2},{3},{4},{5}", c_stations(a, 0), c_stations(a, 1),
                                 c_stations(a, 2), c_stations(a, 3), interp(a), nearest(a))

             slineArr.Add(sline)


    can be done without strings. You can make a simple one form project to test it and see if it helps. Like I did in the thread below.

    You should think about keeping your station data as double rather than strings. Even make a class with both.

    Look at the improvements made by Armin's UInteger method in this thread where the speed of converting from a string to decimal array was improved by over 100 times.

    https://social.msdn.microsoft.com/Forums/vstudio/en-US/7e3e9c72-682d-4b4e-9170-bacd341673c7/best-way-to-convert-a-string-of-integers-to-an-array?forum=vbgeneral

    Once you have the read and write times optimized then look at removing and optimizing the calculation strings.

    • Marked as answer by deskcheck1 Wednesday, May 2, 2018 4:07 PM
    Wednesday, May 2, 2018 2:00 PM

All replies

  • Are you still using code like this?

                Dim sw As New System.IO.StreamWriter(MYFILENAME)
    
                If dataType = "PRCP" Then sw.WriteLine("Date,Recorderd Precip,Flag1,Flag2,Interpolated Value")
                If dataType = "TMAX" Then sw.WriteLine("Date,Recorderd Tmax,Flag1,Flag2,Interpolated Value")
                If dataType = "TMIN" Then sw.WriteLine("Date,Recorderd Tmin,Flag1,Flag2,Interpolated Value")
                If dataType = "AWND" Then sw.WriteLine("Date,Recorderd AWIND,Flag1,Flag2,Interpolated Value")
    
                For a = 1 To obscount + 1
                    sline = c_stations(a, 0) & "," & c_stations(a, 1) & "," & c_stations(a, 2) & "," & c_stations(a, 3) & "," & interp(a) & "," & nearest(a)
                    sw.WriteLine(sline)
                Next a

    You never close the stream (sw.Close()), so that would be leaking resources as the program executes.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Tuesday, May 1, 2018 12:30 PM
    Moderator
  • Hi,

    No.  I've corrected that method as follows:

    Sub CalculateInterpolatedValues(fstations(,) As Single, source_stations(,) As String, c_stations(,) As String, stationName As String, dataType As String)
            Dim interp(90000) As Single
            Dim int_count(90000) As Single
            Dim nearest(90000) As String
            Dim b As Integer = 0
            Dim obscount As Integer = 24912
            Dim total_weight As Single = 0
            Dim myWeight As Single = 0
            Dim mySum As Single = 0
            Dim include_count As Single = 0
            Dim slineArr As New List(Of String)

            Try
                'Write header to array
                If dataType = "PRCP" Then slineArr.Add(interpHeader(0))
                If dataType = "TMAX" Then slineArr.Add(interpHeader(1))
                If dataType = "TMIN" Then slineArr.Add(interpHeader(2))
                If dataType = "AWND" Then slineArr.Add(interpHeader(3))

                For a = 1 To obscount - 1
                    b = 1
                    include_count = 0
                    total_weight = 0
                    mySum = 0

                    Do While include_count < 5
                        If (fstations(b, a) > -99 And source_stations(b, 0) <> stationName) Then ' use anystation except itself

                            If include_count = 0 Then ' assign nearest
                                nearest(a) = String.Format("{0},{1}", source_stations(b, 0), source_stations(b, 1))
                            End If

                            myWeight = 1 / source_stations(b, 1)
                            total_weight = total_weight + myWeight
                            mySum = mySum + (myWeight * fstations(b, a)) ' running total

                            include_count = include_count + 1
                        End If

                        b = b + 1
                        If b = 100 Then Exit Do 'exit loop less than 5 in 100 closest
                    Loop

                    ' enough collected for an estimate
                    If include_count > 1 Then
                        interp(a) = mySum / total_weight
                        int_count(a) = include_count
                    Else
                        interp(a) = -99
                        int_count(a) = include_count
                    End If

                    Dim sline As String = String.Empty

                    sline = String.Format("{0},{1},{2},{3},{4},{5}", c_stations(a, 0), c_stations(a, 1), c_stations(a, 2), c_stations(a, 3), interp(a), nearest(a))
                    slineArr.Add(sline)
                Next a

                Dim MYFILENAME As String = String.Format("{0}{1}_{2}_Intp.csv", interpPath, stationName, dataType)
                File.WriteAllLines(MYFILENAME, slineArr.ToArray)

            Catch ex As Exception
                Console.WriteLine(String.Format("Error calculating interpolated values: {0}", ex.ToString()))
                logList.Add(String.Format("Error calculating interpolated values: {0}", ex.ToString()))
                CreateLogfile("interp", logList)
                Exit Try
            End Try
        End Sub

    I'm not using StreamWriter for this; it was hanging up the app.


    Marilyn Gambone




    • Edited by deskcheck1 Tuesday, May 1, 2018 3:00 PM
    Tuesday, May 1, 2018 2:29 PM
  • OK, then I doubt that the issue is related to IO.File.WriteAllLines (at least not directly).

    At this point you should probably run a performance profile and do a memory capture to see exactly what is consuming the most memory.

    https://docs.microsoft.com/en-us/visualstudio/profiling/beginners-guide-to-performance-profiling


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    • Marked as answer by deskcheck1 Tuesday, May 1, 2018 3:32 PM
    Tuesday, May 1, 2018 3:08 PM
    Moderator
  • Make a copy and start removing code until the problem stops. Then the last thing you removed points to where it is.

    Its not just normal garbage collection and dumping?

    How long is it taking to run now? Last I noticed you said took 6 hrs I think for a run. What is the current time? Just curious.  :)

    • Marked as answer by deskcheck1 Tuesday, May 1, 2018 3:35 PM
    Tuesday, May 1, 2018 3:18 PM
  • Hi, Tommy,

    I'm re-running the above code (which was based on several suggestions from this Forum, thanks).  I'm re-running again on two machines just to see the difference.  I'm now noticing it spending more time writing files than reading them.  Previously, Read(B/sec) was showing high while the Write(B/sec) was zero.  Now it seems to be behaving better.  My older version was looping twice (the For a= 1 to obscount loop).  I re-coded it as above.

    Fingers crossed :)

    BTW, the 6 hrs was fast but ALL the data was wrong.  Just like what Reed said that I made a mistake in initializing them.  So it was junk, anyway.  I don't know yet how long this one will take but initial data are correct.


    Marilyn Gambone


    • Edited by deskcheck1 Tuesday, May 1, 2018 3:39 PM
    Tuesday, May 1, 2018 3:35 PM
  • Whenever you get a chance let us know if the speed increased.

    Anyhow I did a test with a loop and IO.File.WriteAllLines that output an 89.6 MB file. Code and result follow. No noticeable change in memory occured so I didn't screen shoot it.

    Option Strict On
    
    Public Class Form1
    
        Dim SW As New Stopwatch
    
        Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
            Me.Location = New Point(CInt((Screen.PrimaryScreen.WorkingArea.Width / 2) - (Me.Width / 2)), CInt((Screen.PrimaryScreen.WorkingArea.Height / 2) - (Me.Height / 2)))
        End Sub
    
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            Label2.Text = "Waiting"
            Label2.Refresh()
            Label4.Text = "Waiting"
            Label4.Refresh()
            Dim s As String = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890!@#$%^*()_+-={}[]\|;':,',./<>?"
            Dim StrList As New List(Of String)
            SW.Restart()
            Label2.Text = SW.Elapsed.ToString
            Label2.Refresh()
            For i = 1 To 1000000
                StrList.Add(s)
            Next
            IO.File.WriteAllLines("C:\Users\John\Desktop\MegaText.Txt", StrList.ToArray)
            Label4.Text = SW.Elapsed.ToString
            SW.Stop()
        End Sub
    
    End Class


    La vida loca

    • Marked as answer by deskcheck1 Wednesday, May 2, 2018 12:38 PM
    Tuesday, May 1, 2018 7:25 PM
  • Hi, Monkeyboy,

    Based on your previous response, I found it faster to use the File.WriteAllLInes instead of StreamWriter.  That solved the issue of the app just hanging up and throwing errors.

    After installing SysInternals tools, I found my dev machine is running faster than the 40-core machine:  3.5GHz beats 2.26GHz by much: 30,000+ files written overnight vs. 3,000 files for the slower CPU.  My dev machine needs more RAM while the 40-core machine needs more processing power. 

    I'll just let this run through but I think it would still take days.  30,000 files overnight and based on my previous run (with the wrong data), I should have 216,000 files written in all.  So, that's like over a week of running in my dev machine.

    I don't understand my 40-core machine.  It has Windows 2016 OS.  Even with 2.26GHz, I expected it to zip right through but it's so slow.

    The Process Explorer (from SysInternals) shows Page Fault varies from 700KB-36MB.  The System Commit is at 126GB while Physical Memory is 80GB.  The Read Bytes is at 2.3MB while Write Bytes is at 0; which means to me it's spending a lot of time reading instead of writing which is the whole point of this app!


    Marilyn Gambone



    • Edited by deskcheck1 Wednesday, May 2, 2018 12:39 PM
    Wednesday, May 2, 2018 12:36 PM
  • Marilyn,

    What version of Visual Studio are you using (from Help -> About)?

    The built-in performance profiler in 2017 can show you exactly where the time and resources are being spent (right down to the line of code).  But you do need to ensure you are on the latest version of 2017.

    At this point (if it is important enough to your organization) you can contact me directly (email is in human-readable format in my profile) and we can arrange a one-on-one remote review of your code.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Wednesday, May 2, 2018 12:54 PM
    Moderator
  • At the end of the loop you create the filename and write all the lines,  The process of writing all lines cause there to be two copies of the data (the .ToArray).

    Try removing the .ToArray and see if that helps.  Also, after writing the files clear the list.


    "Those who use Application.DoEvents() have no idea what it does and those who know what it does never use it."

    - from former MSDN User JohnWein

    SerialPort Info

    Multics - An OS ahead of its time.


    • Edited by dbasnett Wednesday, May 2, 2018 1:07 PM
    Wednesday, May 2, 2018 1:07 PM
  • Mari,

    You have not said if you did much with string optimizing.

    When you do your run you should set up timers to know at the end how much time you spend reading files, writing files, and calculating (or maybe the performance analyzer gives it).

    Then you should try to remove all string functions if your calculating time is significant compared to your read/write times. You should try to use only double variable types (instead of single or string). Even convert your source_stations string array before the run.

    Lines like this:


             sline = String.Format("{0},{1},{2},{3},{4},{5}", c_stations(a, 0), c_stations(a, 1),
                                 c_stations(a, 2), c_stations(a, 3), interp(a), nearest(a))

             slineArr.Add(sline)


    can be done without strings. You can make a simple one form project to test it and see if it helps. Like I did in the thread below.

    You should think about keeping your station data as double rather than strings. Even make a class with both.

    Look at the improvements made by Armin's UInteger method in this thread where the speed of converting from a string to decimal array was improved by over 100 times.

    https://social.msdn.microsoft.com/Forums/vstudio/en-US/7e3e9c72-682d-4b4e-9170-bacd341673c7/best-way-to-convert-a-string-of-integers-to-an-array?forum=vbgeneral

    Once you have the read and write times optimized then look at removing and optimizing the calculation strings.

    • Marked as answer by deskcheck1 Wednesday, May 2, 2018 4:07 PM
    Wednesday, May 2, 2018 2:00 PM
  • I'm using VS 2015; it also has a Performance Profiler.  Forgot about that.  Will need to check that out.  Will try and set up a remote review of my code.

    Thanks.


    Marilyn Gambone

    Wednesday, May 2, 2018 4:03 PM
  • Hi Tommy,

    Thanks.  I've been wanting to get the types from string to double.  The original data was using Single.  I guess because it's using arrays, where you can only store one data type, that's why they used strings.

    Whew!  I'm beat.  I have so much to do.  Thanks for bringing this up.  Will work on this, too and see where that gets me.


    Marilyn Gambone


    • Edited by deskcheck1 Wednesday, May 2, 2018 4:07 PM
    Wednesday, May 2, 2018 4:07 PM
  • I'm using VS 2015; it also has a Performance Profiler.  Forgot about that.  Will need to check that out.  Will try and set up a remote review of my code.

    Thanks.


    Marilyn Gambone

    That should work, but VS2017 is a little better.  I would recommend upgrading.

    I use CodeMentor.io for 1-on-1 sessions (so there's nothing to set up) if you decide it is worth going that route.  Please email me if you wish to pursue that option.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

    Wednesday, May 2, 2018 4:13 PM
    Moderator
  • Hi, Reed,

    I ran the Performance Profiler and surprised that it's spending 28.85% of CPU in "LoadStationData" method, mainly because I'm using, If File.Exists(contFile) statement.  Next, is the "CreateStations" method, which reads data line by line from a file.

    Also a lot of contentions from some threads which are CLR.dll specific.  I noticed that these use a lot of resources:

       File.Exists statements
       Converting from strings to Dates
       Converting from strings to numeric
       Using StreamReaders


    Marilyn Gambone


    • Edited by deskcheck1 Wednesday, May 2, 2018 4:53 PM
    Wednesday, May 2, 2018 4:52 PM
  • Hi, Reed,

    I ran the Performance Profiler and surprised that it's spending 28.85% of CPU in "LoadStationData" method, mainly because I'm using, If File.Exists(contFile) statement.  Next, is the "CreateStations" method, which reads data line by line from a file.

    Also a lot of contentions from some threads which are CLR.dll specific.  I noticed that these use a lot of resources:

       File.Exists statements
       Converting from strings to Dates
       Converting from strings to numeric
       Using StreamReaders


    Marilyn Gambone



    Did you try my suggestion to see if it would help with the memory issues?

    "Those who use Application.DoEvents() have no idea what it does and those who know what it does never use it."

    - from former MSDN User JohnWein

    SerialPort Info

    Multics - An OS ahead of its time.

    Wednesday, May 2, 2018 5:05 PM
  • Hi, Reed,

    I ran the Performance Profiler and surprised that it's spending 28.85% of CPU in "LoadStationData" method, mainly because I'm using, If File.Exists(contFile) statement.  Next, is the "CreateStations" method, which reads data line by line from a file.

    Also a lot of contentions from some threads which are CLR.dll specific.  I noticed that these use a lot of resources:

       File.Exists statements
       Converting from strings to Dates
       Converting from strings to numeric
       Using StreamReaders


    Marilyn Gambone


    Well you should be able to factor out the File.Exists by simply ensuring that you only feed the program a valid list of files.

    The conversions should be avoided unless absolutely necessary, and when needed, ensure you use the most efficient conversion method (you want methods that make no assumptions).

    Since the beef of the program is reading files there's nothing you can really do about the StreamReaders (other than ensure you are using them optimally).

    Also, do double check dbasnett's comment; you could be unnecessarily creating some huge arrays.  To be honest, I don't believe that WriteAllLines versus just writing a line at a time to an open stream is saving you anything - I suspect any perceived savings are due to problems with how the lines were being written in the first place.


    Reed Kimble - "When you do things right, people won't be sure you've done anything at all"



    Wednesday, May 2, 2018 5:36 PM
    Moderator
  • The Process Explorer (from SysInternals) shows Page Fault varies from 700KB-36MB.  The System Commit is at 126GB while Physical Memory is 80GB.  The Read Bytes is at 2.3MB while Write Bytes is at 0; which means to me it's spending a lot of time reading instead of writing which is the whole point of this app!


    Marilyn Gambone



    Well if it is using a streamreader in a loop to read line by line files then perhaps there is another option to that. Such as read the entire file once and then step through each line in code. I can't think of a method off hand nor have I seen the code that reads files. Maybe you could display that code so somebody could come up with something.

    Actually there is a File.ReadAllLines Method (String) that you could use to read an entire file into a List(Of String) then process the List(Of String) indexes for the lines of CSV in it. Maybe that would speed things up. But I wouldn't know how to implement it although it would probably be faster than looping a streamreader doing constant disk IO.

    I added new code to the previous app I displayed to use IO.File.ReadAllLines to read the 89.6 MB file and loop through all the lines in it though no processing was performed on the 1,000,000 lines of text.

        Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
            Label2.Text = "Waiting"
            Label2.Refresh()
            Label4.Text = "Waiting"
            Label4.Refresh()
            Dim StrList As New List(Of String)
            SW.Restart()
            Label2.Text = SW.Elapsed.ToString
            Label2.Refresh()
            StrList = IO.File.ReadAllLines("C:\Users\John\Desktop\MegaText.Txt").ToList
            For i = 0 To StrList.Count - 1
                Dim s As String = StrList(i)
            Next
            Label4.Text = SW.Elapsed.ToString
            SW.Stop()
        End Sub


    La vida loca

    Wednesday, May 2, 2018 6:21 PM
  • Hi, Reed,

    I think I've tweeked the be-jesus out of this app.  It came to a point where the more I mess around with the data types, the more it slows down.

    I think I figured out the problem with my 40-core machine.  For some reason, although Windows Server 2016 says .Net Framework 4.6 is pre-installed, it only installed 2 features.  I changed the Feature settings to install everyting that came with v4.6.  Now, all it's engines seem to be running so much faster.

    I'll just let this run through and if it takes a week, so be it.  My boss says it usually takes days before it finishes.

    Thanks for all the help.  This has been quite a journey.


    Marilyn Gambone

    Thursday, May 3, 2018 4:23 PM
  • Hi,

    I tried this. After writing the files and clearing the list, it was writing anything. So I deleted the clearing the list.


    Marilyn Gambone

    Friday, May 4, 2018 1:19 PM