locked
Paging, $top, $skip and total count RRS feed

  • Question

  • Hi,

     

    First of all, congrats to the team with the CTP of Astoria, uhhh ADO.NET services. This is a great and elegant piece of technology.

     

    Today we installed and played with the WebDataServices CTP and we noticed the following thing:

     

    The URL parameters $skip and $top will help you to 'page' large resultsets. In the UI you would like to present some info like: Showing page 2 of 23. This requires you to know the total amount of 'matches' in your query. I haven't been able to locate that info in the response.  Is this something you're planning?

     

    Thanks!

    Wednesday, December 12, 2007 11:21 PM

Answers

  • Thanks for the comments Smile

     

    Turns out that computing the number of rows in a table is an expensive operation (proportional to the size of the table), at least in the case of traditional databases. Doing this by default could negatively impact the underlying data source (again, at least in the case of databases).

     

    Two alternatives:

    -Use "next" and "previous", not as nice as page numbers but often good enougn

    -Create a service operation that computes the total number of rows for a given criteria (assuming these are known a priori). Then do 2 calls, one to find the count and one (or more) to find the actual pages of data. If the data has low-enough volatility, you can cache the page count and only re-obtain that data from time to time.

     

    We're currently not planning on adding a "total count" element to the output. If this blocks most people from writing applications we could consider it, but I would really prefer not to require that information from the underlying data source...feedback is welcome!

     

    Pablo Castro

    Technical Lead

    Microsoft Corporation

    http://blogs.msdn.com/pablo

     

    Thursday, December 13, 2007 12:22 AM
    Moderator

All replies

  • Thanks for the comments Smile

     

    Turns out that computing the number of rows in a table is an expensive operation (proportional to the size of the table), at least in the case of traditional databases. Doing this by default could negatively impact the underlying data source (again, at least in the case of databases).

     

    Two alternatives:

    -Use "next" and "previous", not as nice as page numbers but often good enougn

    -Create a service operation that computes the total number of rows for a given criteria (assuming these are known a priori). Then do 2 calls, one to find the count and one (or more) to find the actual pages of data. If the data has low-enough volatility, you can cache the page count and only re-obtain that data from time to time.

     

    We're currently not planning on adding a "total count" element to the output. If this blocks most people from writing applications we could consider it, but I would really prefer not to require that information from the underlying data source...feedback is welcome!

     

    Pablo Castro

    Technical Lead

    Microsoft Corporation

    http://blogs.msdn.com/pablo

     

    Thursday, December 13, 2007 12:22 AM
    Moderator
  • Hi,

     

    Thanks for your response. I see your point about computational overhead.

     

    One way to deal with this is to make the total count calculation optional (you have to explicitly ask for it. For example you could have urls like:

     

    .../customers(city eq "Amsterdam")?$option=count&$top=30

     

    where the $count parameter indicates you would like to know the total number of matching rows for your query. While this incurs some overhead, you actually may need to do this only once in a paging scenario.

     

    This would also support another UI scenario, where an application UI allows a user to search a large dataset (image a large real estate listing) by stepwise refining a query. Suppose the initial query (e.g. city eq "Amsterdam") delivers 1000's of results this would swamp the client. In these case you might want to have the option to ask the dataservice the totalcount for the query, without actually getting rows. If the totalcount is greater than a certain max the application could indicate that count and ask the user to make the query more specific.

     

    So here you might like to do:

    .../customers(city eq "Amsterdam")?$option=count&$top=0

     

     

    With respect to your suggestion to create a custom service: one of the exciting opportunities for Astoria  approach would be where business application are turned into data providers and allow (controlled) re-use by other applications (data consumers). A consuming application are likely owned/maintained by another group/organization which is not in control of the data providing application. Requiring custom services on the originating platform might be a show-stoppper. The same holds where dataservices are offered in an hosted environment (DaaS? :-) )

     

     

     

    Hope this make sense.

    Thursday, December 13, 2007 7:57 AM
  • This is definitely a must have feature.  Making it an option would be totally acceptable and would be the best of both worlds.
    Monday, December 24, 2007 11:30 PM
  • This is absolutely killing me.  I thought it would be easy to just make a 2nd call for now until you guys add support for this to get the record count, but now that I started looking into doing that, I have no clue how it's even possible.  I need to do this in a generic method based on the $filter criteria.  So now I guess I'm going to have to write my own parser for that to be able to get a simple record count.  This seems like such a crazy oversight in what otherwise seems like an ingenious project.  Sigh.
    Thursday, December 27, 2007 9:31 PM
  • Getting a count for a query seems like a basic thing that needs to be supported.  There are many situations where count is needed. 

    Which brings up another point.  There should be a way to plug into the query parsing of the data service.  We should be able to override something that has the current IQuerable expression.  Then you could parse your own param and add to the expression tree.  This would allow the data service to be extended for situations like this.

    ~ Paul
    Thursday, December 27, 2007 9:47 PM
  • Hi,

    I'd also like to request that total count be added in a future release. Please Smile

    In the mean time though, has anybody created a service operation that utilizes $filter? Is it possible?

    Thanks,

    Michael
    Thursday, April 3, 2008 5:40 AM
  • without these functions Data Services are all but useless. Everything we write has the potential to be "computationally expensive" and we make calls trading efficiency for functionality. So far the most inefficient suggestion I have seen here is to retrieve all of the records and count them on the client side. All the flexibility offered by this framework would be negated by having to write custom Count methods for all the possible queries.

     

    I cannot see Data Services being used in any project i am working on until it can support these basic methods we use in everyday programming. Which is a pity because other than this oversight it seems to be a good product. 

    Thursday, September 18, 2008 4:26 AM
  •  

    "So far the most inefficient suggestion I have seen here is to retrieve all of the records and count them on the client side."

    You could move the computation to the server-side , but you are right , the server-side implementation only works for top level resources and not dynamic queries.

    http://blogs.msdn.com/phaniraj/archive/2008/08/18/entities-how-many-ways-do-i-count-thee.aspx
    Thursday, September 18, 2008 5:57 AM
    Moderator
  • I had been looking at Data Services for an upcoming project and assumed record count was supported - after all, why provide paging if you can't get the number of pages returned? It's horrible that they've provided all these great features and left this ubiquitous requirement out.  I literally can not use Data Services if I can't efficiently retrieve the number of records being returned.  This is a deal-breaker.

    What use is paging if you have no idea how many pages there are?  I can't conceive a single situation where I would want to page and not know the number of pages that exist.  That's just silly.

    We absolutely need an effective method of returning the number of results a query produces.  This needs to be possible whether returning with paging, no paging, just the top 10, with filtering, without filtering, etc.

    To think that this has been a need since December of 2007 is appalling.  We might as well start looking for another way to accomplish our goals.  It's evident this feature is extremely low priority.


    Edit:  The suggestions made in the third post of this thread are acceptable to me.
    Thursday, September 18, 2008 4:10 PM
  • I totally agree with Nick-Williams. It's like driving a car blindly. Bummer.
    Tuesday, September 23, 2008 2:15 PM
  • Hey Nick.  Not sure you are thinking about all sides of this issue. Unless the db is readonly, there is *no way to know the exact count up front.  A count would only be in instant in time.  On a fast changing RW database, your count is probably way off by the time you make your second query page request. If your first query returned a count, it could be off by hundreds before your next query.  This could mean your telling the user the count is 100 when it could be 1000 by now or visa-versa - you could read a count of 1000, and 500 records could have been deleted before your next page query.  So they are trying to tell you not to assume a count - which is the proper way to handle this issue (always has been).

     

    Using extention methods, you could do something like below. Note that, in theory, this could run forever; which helps show the problem with fixing a count upfront.  Naturally, if querying a static/readonly db, you can read the count up front and then page.

     

    var q = from e in Entities.GetDB().Entries
            where e.IsPrivate==true
            select e;

    int count = 0;
    Console.WriteLine("Return records using paging.");
    foreach (Entries e in q.TakeAllUsingPaging(pageSize))
    {
        ObjectDumper.Write(e);
        count++;
    }
    Console.WriteLine("Records {0}", count);

    count = 0;
    Console.WriteLine("Return page size chunks");
    foreach (IEnumerable<Entries> list in q.TakeAllPages(pageSize))
    {
        Console.WriteLine("Chunk...");
        ObjectDumper.Write(list);
        count += list.Count();
    }
    Console.WriteLine("Records {0}", count);


    public static class MyExtentions
    {

            // Returns pages of records.
            public static IEnumerable<IEnumerable<T>> TakeAllPages<T>(this IQueryable<T> query, int pageCount)
            {
                int count = 0;
                while (true)
                {
                    var q = query.Skip(count).Take(pageCount).AsEnumerable();
                    count += q.Count();
                    yield return q;
                    if (q.Count() < pageCount)
                        yield break;
                }
            }

            // Returns IEnumerable using paged queries.
            public static IEnumerable<T> TakeAllUsingPaging<T>(this IQueryable<T> query, int pageCount)
            {
                int count = 0;
                while (true)
                {
                    var q = query.Skip(count).Take(pageCount).AsEnumerable();
                    foreach (T t in q)
                    {
                        count++;
                        yield return t;
                    }
                    if (q.Count() < pageCount)
                        yield break;
                }
            }
    }  

    Wednesday, October 15, 2008 1:33 AM
  • Do you think I'm expecting them to predict the future?  Of course the count would be good for only the instant it was retrieved!  The "issue" is we can't - in no way, shape, or form - have any idea as to how many records are being retrieved without retrieving the entire set and counting.  That is, clear as day, a problem.

    If you have 100 products in a store, and you wanted to let a user page through those 100 products in tens.  You couldn't do anything but provide a forward and back button.  And even then you wouldn't know when to disable "forward" because you have no idea whether that 10 you retrieve was the last ten or not.  This situation, or use case, is prolific (look to your lower right) - and with Data Services it is impossible.

    Furthermore, I find it insulting that you think we're "doing it the wrong way" when we ask to be provided a way to determine how many records are being returned.  We're not doing it the wrong way, you just don't realize how often this is needed.  Look around you, everything you interact with that has copious amounts of data uses paging.  This forum, all forums, every search engine under the sun, ecommerce stores, document repositories, etc, etc, etc.  All of which operate by knowing how many records are being returned at a given point in time.  The fact that things are changing is as simple to solve as it is obvious, given the ability to retrieve the count.  Are all of these people doing it wrong too?  Surely someone should let Google know that paging is the wrong way to do it.

    Even if I did want to assume a count - I couldn't!  Because there is no way of retrieving the count!

    Wednesday, October 15, 2008 1:45 AM
  • First, relax a bit. This is not the end of world.  Second, you *can get the count if you really need it with something like below.  Not the best solution, agreed, but something that is workable.

     

    [WebGet]

    public int GetUsersCount(string predicate)

    {

    return this.CurrentDataSource.Users.Where(predicate).Count();  //eSql predicate

    }

     

    As you know, this issue is a general issue, not so much an Astoria issue.  How do you handle paging today when you get your 100 user count on query 1 and then someone adds or removes 50 users before you query 2?  Are you just fixing up the pages and adjusting the count again?  In that case, the count still did not help.  If you are working with static data (which it seems you may), then it is safe to get a count upfront.  But otherwise a Count makes no sense as the db is dynamic (as you know) - so the count does not help - in fact it can confuse the user or bomb your app (if recordset is dynamic).  I am interested in how you are doing it today and if your data is static or dynamic.

    Wednesday, October 15, 2008 2:41 AM
  • Could it be possible that we know what we want, and why? This thread is not about how to properly deal with paging dynamic data, andthis thread is not about how I deal with paging in my applications. This thread is about retrieving the number of results from a given Data Services URI. 

    Your latest suggestion is fine and dandy unless we want to use the filtering and ordering and all the other great things Data Services provides access to through the URI.

    We need a method to get the record count - whether it's a simple URI or we're applying filters, etc.  We should be able to do something like add $RecordCount to our URI and be able to retrieve the value from the data returned.  It's that simple.

    Data Services is great; it's a huge step forward that saves a ton of time if you're developing something to be consumed by many.  However, the lack of a Record Count feature is an enormous oversight.
    Wednesday, October 15, 2008 3:03 AM
  • 1) It is not there today and you already voiced your need many times.  That will not make it pop into your inbox tomorrow.  So you have to go a different route for now.

     

    2) I gave you an option you can use with a DS URI *and a filter if needed. If you don't want to use it, that is fine. But don't say you can't do it, because you can. You can come up with other variations of it as well.

     

    3) Now I don't understand.  You say "We should be able to do something like add $RecordCount to our URI".  If you are returning the whole set, you already have the record count when the records return.  If instead you want "$top(10)$RecordCount", we are back to a query that makes no sense in the context of a dynamic set as your not returning the snapshot set at that point in time.  I asked how you solve that issue today without DS and you did not say. If you said, maybe we could find a similar way using DS.

     

    4) Another option is two queries (one for Count and one for top(10)), and use ExecuteBatch(q1, q2).  #3 above (if available) would also need to be two result-sets (something like how you would use a sqlreader today to get MRS).

    Wednesday, October 15, 2008 3:44 AM
  • 1)  Are you honestly trying to villify me as needy, when you're the one soliciting my responses?  I put in my suggestion a month ago and continued on without Data Services.  I'm on the same route I've been on: ADO.NET's DataSet .

    2)  Your solution is not acceptable.  I would have thought the lack of support shown for Phani's suggestion made this evident.  Iterating through things client side or server side is not a good solution.  This is just not practical when dealing with large numbers of results, and it's something that should be taken care of behind the scenes.

    3)  $top(10)$RecordCount is absolutely what we want - it would solve any situation where one might need a record count.  We should be able to use this at any time, no matter the URI (any data-returning URI).  It should give the number of records that would be returned if the $top keyword was not present in the URI.  It's that simple.  A URI like "AdventureWorks.svc/ProductCategories(5)/Products/$top(5)$RecordCount" and " AdventureWorks.svc/Products?$filter=Color=Red&Sorderby=Price$top(10)$skip(500)$RecordCount" should be possible.

    4)  I'm really not interested in hacks.  I'll wait until the Data Services team provides this functionality to start using it.

    Is one record from an often-changing database less than or greater than one? (It's one). How about one record from a stale database? (It's also one).  I haven't answered your question because the answer is absolutely irrelevant - one record is one record no matter where it comes from.  If you feel otherwise, then pretend it's entirely static data and review your math.

    If we had the option of adding such a keyword, it would be capable of solving everyone's problems.  If we had totally static data, we could issue a single $top(10)$RecordCount, followed by a $top(10)$skip(10), skip(20), etc.  If we had data that was changing, we could suffix the URI with $RecordCount every few pages (relative to the number of records) and update the paging interface to accurately reflect things.  Now, as for returning the value.. I would imagine a single node in the XML (or whatever else) named "Records" would be sufficient.
    Wednesday, October 15, 2008 4:16 AM
  •  Nick-Williams wrote:

    2)  Your solution is not acceptable.  I would have thought the lack of support shown for Phani's suggestion made this evident.  Iterating through things client side or server side is not a good solution.  This is just not practical when dealing with large numbers of results, and it's something that should be taken care of behind the scenes.

    Not acceptable compared to what?  Compared to not doing anything and going home?  The GetUsersCount() on the server side is same as:

    SELECT COUNT(*) AS [value]
    FROM [Users] AS [t0]

    That is slow compared to what?  This query would be required anyway, even by their internal implementation of $count.

    3)  $top(10)$RecordCount is absolutely what we want - it would solve any situation where one might need a record count.  We should be able to use this at any time, no matter the URI (any data-returning URI).  It should give the number of records that would be returned if the $top keyword was not present in the URI.  It's that simple.  A URI like "AdventureWorks.svc/ProductCategories(5)/Products/$top(5)$RecordCount" and " AdventureWorks.svc/Products?$filter=Color=Red&Sorderby=Price$top(10)$skip(500)$RecordCount" should be possible.

    I don't disagree. I want to see $count too.  But we (both) need something today.  I thought the goal was to find something that worked today, not wait a year.


    4)  I'm really not interested in hacks.  I'll wait until the Data Services team provides this functionality to start using it.
    Is one record from an often-changing database less than or greater than one? (It's one). How about one record from a stale database? (It's also one).  I haven't answered your question because the answer is absolutely irrelevant - one record is one record no matter where it comes from.  If you feel otherwise, then pretend it's entirely static data and review your math.

    hmm.  I don't understand that comment.  BTW, none of that code was hacks. Unless your saying any solution other then $count is a hack.  Your saying using service operations is bad?  IMO, that is why they exist, to extend your own api and provide functions not available otherwise.  Programmers, program. They don't wait around for the perfect api, because it never exists (save the .Net framework)



    If we had the option of adding such a keyword, it would be capable of solving everyone's problems.  If we had totally static data, we could issue a single $top(10)$RecordCount, followed by a $top(10)$skip(10), skip(20), etc.  If we had data that was changing, we could suffix the URI with $RecordCount every few pages (relative to the number of records) and update the paging interface to accurately reflect things.  Now, as for returning the value.. I would imagine a single node in the XML (or whatever else) named "Records" would be sufficient.

     

    Now your talking.  That is called a multiple result set or MRS as you know.  You can get these with ExecuteBatch. So you could query a count in one query, and query a set in another and make 1 request/reply over the wire. Having Out params would also be useful if they allowed it, as you could do something like this to return a mrs:

    [WebGet]

    public IQueryable<Users> GetUsers(out int countAll, out bool isSomething)

    {

         var q = // Query a set of users.

         countAll = //this set count before any further filter supplied.

         isSomething = false;

         return q;

    }

    Thursday, October 16, 2008 12:28 AM
  • As you correctly point out you are looking for the number of rows in the query (not just a table) so clearly you have to count the records every time. Having a static database, contrary to some of the comments, will not help you.

    Consider a simple database with one table and one text column. There are as many "counts" as there are queries; all records starting with X but not containing the sequence IOU.  You can estimate the number of records using COUNT (accuracy 100% for static data) or count what you get back from your query (100% accurate) and cache that for paging.
    Tuesday, April 21, 2009 8:14 PM