none
C# Program takes all of the memory RRS feed

  • Question

  • Hi there, 

    The below program takes a very big HTML (4GB), my laptop has 16GB of memory.

    When I run the program, it does the job, but it gradually takes more and more memory (looking at the TaskManager).

    Why would that be since the file is only 4GB in size ?

    What am I doing wrong please ?

    What would you do differently ?

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using System.Windows.Forms;
    using System.IO;
    using System.Data.SqlClient;
    
    
    
    
    namespace HTML2CSV
    {
        public partial class Form1 : Form
        {
            public Form1()
            {
                InitializeComponent();
            }
    
            private void button1_Click(object sender, EventArgs e)
            {
                  // Open File =====================================================
                string userFolder = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
                string userPath = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile) + "\\Desktop";
                
                OpenFileDialog ofd = new OpenFileDialog();
                ofd.InitialDirectory = userPath;
                ofd.Filter = "html files (*.html)|*.txt|All files (*.*)|*.*";
                ofd.FilterIndex = 2;
                ofd.RestoreDirectory = true;
                ofd.ShowDialog();
                string fName = ofd.FileName;
                label1.Text = fName;
                label1.Refresh();
                string theHTMLFile;
                theHTMLFile = fName;
                int collectedCount = 0;
                string theQuery;
                string con_string = "Server=127.0.0.1;Database=SHARES;Integrated Security=true;";
                // Open File ==========================================================
    
                // Get numbers ========================================================
    
                var numbers = new List<int>();
                var lines = new List<string>();
                var lineCount = 0;
                string num;
                int counter = 0;
                int numbersCount = 0;
                int onePercent;
                
    
                using (var reader = File.OpenText(fName))
                {
                    while ((num = reader.ReadLine()) != null)
                    {
                        counter++;
                        numbersCount++;
                        if (num.IndexOf("</tr>") != -1)
                        {
                            //MessageBox.Show("counter: " + counter.ToString() + "num: " + num);
                            numbers.Add(counter);
                        }
                    }
                }
    
                // Get numbers ===============================================================
    
                int rowNumber = 1;
                int progressBarPercentage;
    
                for (int x = 1; x < numbers.Count(); x++) // Take another line number 352, 362, 372 etc.
                {
                    rowNumber = numbers[x];
                    label2.Text = x.ToString();
                    label2.Refresh();
                    onePercent = ((numbers.Count() / 100));
    
                    // =================================================================================
    
                    string lineOfHTML = "";
                    var srhtml = new StreamReader(theHTMLFile);
    
                    for (int y = 1; y <= rowNumber; y++) // read lines one by one and process eight rows when its found at line 352 - 9
                    {
                        lineOfHTML = srhtml.ReadLine();
                        lineCount++;
                        if (lineCount == (rowNumber - 9))
                        {
                            while (collectedCount <= 7)
                            {
                                lines.Add(srhtml.ReadLine().Replace("<td valign=\"top\" nowrap=\"nowrap\">", "").Replace("</td>", "").Replace(",", " & "));
                                lineCount++;
                                collectedCount++;
                            }
                            collectedCount = 0;
    
                            theQuery = "INSERT INTO [SHARES].[dbo].[share" + comboBox1.SelectedItem + "] (LineNumber ,[Path] ,[Account] ,[Type] ,Directory_Owner ,Permission_Simple ,Apply_To ,Inherited ,Permissions_Advanced) VALUES (" + rowNumber.ToString() + ", '" +
                                                           lines[0].Replace("'", "") + "', '" + lines[1].Replace("'", "") + "', '" + lines[2].Replace("'", "") + "', '" + lines[3].Replace("'", "") + "', '" +
                                                           lines[4].Replace("'", "") + "', '" + lines[5].Replace("'", "") + "', '" + lines[6].Replace("'", "") + "', '" + lines[7].Replace("'", "") + "')";
            
                            SqlConnection conn = new SqlConnection(con_string);                        
                            conn.Open();
                                SqlCommand cmd = new SqlCommand(theQuery, conn);
                                cmd.ExecuteNonQuery();
                            conn.Close();
                            
                            lineCount = 0;
                            lines.Clear();
                            break;
                        }
                    }
                    // =================================================================================
                    // ProgressBar  ProgressBar  ProgressBar  ProgressBar  ProgressBar  ProgressBar  ProgressBar 
    
                    if (x >= onePercent)
                    {
                        //progressBarPercentage = (x / numbersCount) * 100.0;
                        progressBarPercentage = (100 * x) / numbers.Count;
    
                        label3.Visible = true;
                        label3.Text = progressBarPercentage.ToString() + "%";
                        label3.Refresh();
    
                        //progressBar1.Value = progressBarPercentage;
                        progressBar1.Value = (100 * x) / numbers.Count;
                        progressBar1.Refresh();
                    }
                    // ProgressBar  ProgressBar  ProgressBar  ProgressBar  ProgressBar  ProgressBar  ProgressBar 
                }
    
                progressBarPercentage = 100;
                label3.Text = progressBarPercentage.ToString() + "%";
                label3.Refresh();
    
                //progressBar1.Value = progressBarPercentage;
                progressBar1.Value = progressBarPercentage;
                progressBar1.Refresh();
    
                MessageBox.Show("Done !");
    
            }
    
            private void Form1_Load(object sender, EventArgs e)
            {
                label3.Visible = false;
                string[] s = {"E","F","G","H","I"};
                comboBox1.DataSource = s;
            } //button
        }
    }
    

    Saturday, November 19, 2016 5:20 PM

All replies

  • Probably you see the effect of appending data to numbers list, then postponing the release of multiple srhtml readers (therefore, it is better to use using). The SQL objects can be disposed too with Dispose or using.

    By the way, instead of re-reading the unneeded lines from the beginning, it is possible to organise a collection (List<string> or Queue<string>) which will keep the last required (ten) lines got from file. Then process it when you find “</tr>”, and remove unneeded lines before getting the next ones.

    Maybe it also means that your memory is used efficiently, i.e. you did not pay in vain for gigabytes.





    Saturday, November 19, 2016 7:05 PM
  • Hi nagileon,

    Thank you for posting here.

    For your question, I try to test your code. But i do not have your big html. Hence I could not sure what cause problem takes all of the memory. In my test, I use the following to monitor the memory which the program takes.

    When I run the program, it would not take all the memory. Maybe my file was so small. Please try it and let us know whether the program cause the problem or the big html cause the problem.

    I use a gif to show when I run the program.

    We are waiting your update and we will try our best to give you a solution.

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Monday, November 21, 2016 9:22 AM
    Moderator
  • Hi nagileon,

    By reviewing your code what I understand is that from your 16GB memory it uses the memory for creating temporary memory for following operation:

    > In the beginning of your code you opened a 4GB file so, this will create a temporary
    > Then you are running a loop and creating a list
    > Running another nested loop, creating list inserting data
    > Progress bar percentage progress
    > Visual Studio will take some temporary memory for processing

    Also I would suggest to put your connection string outside loop block and may some other place like in FormLoad() event. This because your connection will open and close multiple times until your insertion has finished. So we could do like open a connection once and when insertion has finished close the connection to the server, this will be better approach.


    Thanks,
    Sabah Shariq

    [If a post helps to resolve your issue, please click the "Mark as Answer" of that post or click Answered"Vote as helpful" button of that post. By marking a post as Answered or Helpful, you help others find the answer faster. ]


    Monday, November 21, 2016 10:00 AM
    Moderator
  • It also seems like you're leaking that StreamReader you're creating each time through the for loop.

    var srhtml = new StreamReader(theHTMLFile);

    Normally this wouldn't be a big deal since you're enumerating a string but since the reader isn't getting cleaned up, the string contained in theHTMLFile isn't going to be either. You're calling this in a for loop so you're creating a new reader each time. Eventually all this data will get cleaned up but it could be a while.

    Monday, November 21, 2016 3:53 PM
    Moderator
  • First of all, do not debug Performance using the Task Manager (unless you show private bytes and other relevant information):
    How much memory does my .NET application use? Explaining the Task Manager figure versus Private Bytes in performance monitor.

    Secondly, the Garbage Collection will avoid running unless it has to run. So a increasing memory profile is absolutely normal. If you do not run into a OOM, that is nothing special. Of course at this size you might run into human-noticeable Garabage Collections down the road.
    Garbage Collection - Pros and Limits

    Thirdly, if you have to process rather big files it is often adviseable to use the "Enumerate" version of any function if present. I.e., EnumerateFiles and EnumerateLines. Enumerators can usually optimised the memory load as well as the time to load stuff from the disk. Your while loop might be very similar to a enumerator, however.


    Remember to mark helpfull answers as helpfull and close threads by marking answers.

    Monday, November 21, 2016 4:47 PM