none
Where the data stored when i am fetching 1 million record but showing in UI 200 at a time using Reactive Extension

    Question

  • Hi,

    I am fetching 1 million records from database. I set buffer size as 2000 to show that records in UI using Reactive Extension.

    My sql query is select * from movielist Note : movielist table contain 1 million records.

    My question is

    1) All the records are fetching at once go from data base? I mean only one transaction to the Db is happening or multiple??

    2) Where all the  data (1 million data) resides during processing this huge no of records? Storing in memory? If memory is there any responsibility to the Reactive Extension to handle this memory management?

    Thursday, June 19, 2014 8:25 AM

Answers

  • Thanks for posting the code.

    For your first question, I think you can answer that in isolation any concern of Rx. Your GetCompanies() method offers no ability to page the data, so YES, all the data will come back in one transaction until the consumer stops enumerating the content. If another call to the method was performed, it would execute the SQL query again and start from the start of the dataset again.

    For your 2nd question the data will fisrt be an in-memory ADO.NET record which you then map to a Company. You "allocate" this block of memory by using the new key word (just normal old .NET so far). Next that Company instance is enumerated to the consumer which will then be OnNext'ed to the observer. In this case the intermediate observer is the Buffer operator. Here it will just be cached into a Queue or List of some description until the sequences either completes or the buffer fills to 2000. At this point all the value in the buffer will be pushed to the final subscriber (as loadedData parameter in your subscribe lambda). You then add each of the Company instances to the 'companies' collection. 

    So hopefully you can see there is no magic happening here at all. I assume that pulling out 1million rows from a database and rendering them on the (WinForm) UI will take quite some time, so you should remember that you will be holding your connection to the DB open this whole time.


    Lee Campbell http://LeeCampbell.blogspot.com

    • Proposed as answer by LeeCampbell Friday, June 20, 2014 1:31 PM
    • Marked as answer by Raj Dey Tuesday, June 24, 2014 8:09 AM
    Friday, June 20, 2014 1:29 PM

All replies

  • This sounds like you are writing some custom Rx to do this, so only you can answer these questions. 

    Unless you can post some code, I would assume you have done the most naive implementation which would potentially be just a ToObservable() operator on an IEnumerable<T> instance and then used the Buffer(2000) operator. This means there would be one connection and call to the DB.

    In this case there will be virtually no smarts at all. Everything will be pulled via the IEnumerable<T> as fast as it can and every time the 2000 records fill the buffer, it will be pushed on the observable sequence to your OnNext handler.

    While records are being buffered, they will be stored in memory. With regards to memory mangement, this is now a Garbage Collector issue. 


    Lee Campbell http://LeeCampbell.blogspot.com

    Thursday, June 19, 2014 12:25 PM
  • internal class CompanyDal
    {public IEnumerable<Company> GetCompanies(string title)
        {
            using (var connection = new SQLiteConnection(connectionString))
            {
                using (var cmd = connection.CreateCommand())
                {
                    cmd.CommandText =
                        "select company, category, state, " + 
                        "phone from maindata where company like @name";
                    cmd.Parameters.AddWithValue("@name", String.Format("%{0}%", title));

                    connection.Open();

                    var reader = cmd.ExecuteReader();

                    using (reader)
                    {
                        while (reader.Read())
                        {
                            var company = new Company
                            {
                                Title = reader.GetString(0),
                                Category = reader.GetString(1),
                                State = reader.GetString(2),
                                Phone = reader.GetString(3)
                            };

                            yield return company;
                        }
                    }
                }
            }
        }

    }

    private void SearchButtonClick(object sender, EventArgs e)
    {
        companies.Clear();
        companyBindingSource.ResetBindings(false);

        stopButton.Enabled = true;
        searchButton.Enabled = false;
        progressLabel.Text = string.Format("Searching for {0} ...", searchBox.Text);

        companyDal.GetCompanies(searchBox.Text).ToObservable(Scheduler.ThreadPool).
            Buffer(BufferSize).TakeUntil(stopClicked).
            ObserveOn(SynchronizationContext.Current).
            Finally(() =>
            {
                searchButton.Enabled = true;
                stopButton.Enabled = false;
            }).
            Subscribe(loadedData =>
            {
                var index = Math.Max(companyGridView.FirstDisplayedScrollingRowIndex, 0);

                companies.AddRange(loadedData);
                companyBindingSource.ResetBindings(false);
                progressLabel.Text = string.Format("Loaded {0} rows", companies.Count);

                companyGridView.FirstDisplayedScrollingRowIndex = index;
            },
      exception => { progressLabel.Text = exception.Message; },
      () => { progressLabel.Text = "Finished loading data"; });
    }
    Thursday, June 19, 2014 12:51 PM
  • Thanks for posting the code.

    For your first question, I think you can answer that in isolation any concern of Rx. Your GetCompanies() method offers no ability to page the data, so YES, all the data will come back in one transaction until the consumer stops enumerating the content. If another call to the method was performed, it would execute the SQL query again and start from the start of the dataset again.

    For your 2nd question the data will fisrt be an in-memory ADO.NET record which you then map to a Company. You "allocate" this block of memory by using the new key word (just normal old .NET so far). Next that Company instance is enumerated to the consumer which will then be OnNext'ed to the observer. In this case the intermediate observer is the Buffer operator. Here it will just be cached into a Queue or List of some description until the sequences either completes or the buffer fills to 2000. At this point all the value in the buffer will be pushed to the final subscriber (as loadedData parameter in your subscribe lambda). You then add each of the Company instances to the 'companies' collection. 

    So hopefully you can see there is no magic happening here at all. I assume that pulling out 1million rows from a database and rendering them on the (WinForm) UI will take quite some time, so you should remember that you will be holding your connection to the DB open this whole time.


    Lee Campbell http://LeeCampbell.blogspot.com

    • Proposed as answer by LeeCampbell Friday, June 20, 2014 1:31 PM
    • Marked as answer by Raj Dey Tuesday, June 24, 2014 8:09 AM
    Friday, June 20, 2014 1:29 PM