Data virtualization in WPF and beyond RRS feed

  • General discussion

  • Data Virtualization in WPF and beyond


    How do you show a 100,000-item list in WPF? Anyone who tried to deal with such a volume of information in a WPF client knows that  it takes some careful development in order to make it work well. 

    Getting the data from where it is (a remote service, a database) to where it needs to be (your client) is one part of the problem. Getting WPF controls to display it efficiently is another part. This is especially true for controls deriving from ItemsControl like ListView and the newly released DataGrid, since these controls are likely to be served large data sets.

    One can question the usefulness of displaying hundreds of thousands of rows in a ListView. There is, however, always one good reason: the customer requests it. And the customer is king, even if the reasoning behind the request is slightly flawed. So, faced with this challenge, what can we do as WPF developers to make both the coding and user experience as painless as possible?

    As of .NET 3.5SP1, this is what you can do today to improve performance in ItemsControl and derivatives:

    -          Make the number of UI elements to be created proportional to what is visible on screen using VirtualizingStackPanel.IsVirtualizing="True".

    -          Have the framework recycle item containers instead of (re)creating them each time, by setting VirtualizingStackPanel.VirtualizationMode="Recycling".

    -          Defer scrolling while the scrollbar is in action by using ScrollViewer.IsDeferredScrollingEnabled="True".  Note that this only improves perceived performance, by waiting until the user releases the scrollbar thumb to update the content. However, we will see that it also improves actual performance in the scenarios described below.

    All these things take care of the user interface side of the equation. Sadly, nothing in WPF takes care of the data side.  Data virtualization is on the roadmap for a future release of WPF, but will not be available in the upcoming .NET 4.0, according to Samantha MSFT (http://www.codeplex.com/wpf/Thread/View.aspx?ThreadId=40531).

     All is not lost, however. I will show you various ways to have your favorite ItemsControl scroll through hundreds of thousands, even millions of items with little effort. Of course, every solution has a price tag, but for most situations it will be acceptable. Promised!

    My “solutions” for data virtualization in WPF relies on two key insights and two usage assumptions. The two key insights are:

    1.       It is possible to automatically construct for an instance of any type T an equivalent lightweight object which, at least for WPF’s binding engine, is indistinguishable from T in most binding scenarios involving binding to properties of T.

    2.       ItemsControl’s access patterns for its item source are highly predictable and need at any time only a fraction of the entire data set. The size of this data set is proportional to the number of visible rows, not to the total number of rows in the data set.

    Two approaches are derived from these two key insights: the item virtualization approach, where individual objects are loaded on demand, and the collection virtualization approach, where the entire data set is virtualized. These two approaches virtually (pun intended) split this article in 2 parts.

     The usage assumptions are:

    1.       In the presence of a large number of items, the users will not look at each and every one of them at the same time.

    2.       Scenarios involving a large number of items are predominantly read-only. If there’s any editing to be done, it will not take place in the ItemsControl holding the large data set.

    If usage assumption 1 is valid, we only need to load what the user needs to see. This assumption is already exploited by VirtualizingStackPanel’s IsVirtualizing and VirtualizationMode modes, but it’s valid for the data side of the equation as well. Therefore, we can concentrate on techniques that load small amounts of data efficiently.

    If usage assumption 2 is valid, we can ignore scenarios where users start editing large data sets in-place. In-place editing with all the bells and whistles (cancellable, transaction safe) has its own set of problems and solutions that is outside the scope of this article.

    You can read the article at http://home.scarlet.be/thehive/DataVirtualization.pdf

    Feedback is welcome.


    Tuesday, May 19, 2009 5:49 PM

All replies