none
Database Design for “Excel like” editing RRS feed

  • Question

  • I’m working on a design for a product that allows users to edit data in the grid. The user interaction is very similar to that of excel. They have column headers, they can add, remove columns, re-arrange them etc. They can also add, insert, and delete rows.   My biggest design problem at the moment is trying to design the DAL such that they users can insert rows into the middle of the data set similar to excel. My understanding is that you cannot insert rows into the middle of a database table. Question is how to design around this limitation?

     

    I’m leaning towards having a row position column in each of the tables and when I fetch rows for the user always include an order rowposition clause so that the user’s view is always sorted. We are hoping to support tables with up to a million rows of data. Is this going to scale incredibly poorly because of the order by on all the queries?

     

    For inserting new rows I was thinking I’d increment the row position of all the rows with a row position greater than the new place for the inserted row.   Something like:

     

    InsertRows(int position, int count)

    {

        //update table set rowposition=rowposition+count where rowposition >= position

      For(int index=position; index<position+count;index++)

    {

                //Add row

                 //set rowposition of new row to index

    }

    }

     

    Would I be better off using a view? Or maybe even better copying all the data to a new table with the rows in the correct place, and skipping the view or order by clause on select statements?

     

    Something like:

     

    //create new table

    //insert into newtable select * from existingtable where rowposition <position

    //insert new rows

    //insert into newtable   select * from existingtable where rowposition   >= position

     

    I’m just wonder if anyone has solved  a similar problem before trying to explore my options.

     

    Thanks

    Friday, April 3, 2009 3:08 PM

All replies

  • As I understand it, the order by clause in SQL is not the fastest thing in the world. With up to a million rows, that would hurt. But, will your users really be editing that many rows at a time? I rather doubt it, as I don't know many people (or apps) that coould handle that much data.

    How about setting up a paging system of some sort? Say you have 10,000 rows in the table. And a "row position" column. Your user wants to get the 15th page. You know that your "pages" will hold 200 rows. Simple math says to get the rows with the row position between 3000 and 3200. Then you're only ordering on 200 rows, plus you don't have to worry about doing complex inserts.

    Then, when the user gets the next page of data, you go from 3200 - 3400.

    In this case, you're still doing the same amount of work as ordering by up front, maybe more, but you're doing it in smaller chunks, which makes the app seem faster.

    Not sure if that would work.


    Chris Snyder, Stumbling through code one day at a time If your question was answered, please mark it.
    Friday, April 3, 2009 3:24 PM
  • Oh absolutely about the caching. I'll be using a "virtual grid" type scenario where as they scroll it will try to render the rows needed. If they aren't in the local cache(s) it will fetch the next X number of rows, using what you said before where row position > x and < y. In that case I guess the DB will only be ordering the smaller number of rows and there for be faster? When I said all selects will use order by I didn't clarify that I would be selecting only chunks of data at a time. Other queries using the tables that need to enumerate through all the rows would be doing the same kind of caching where I'm only returning a subset of the total number of rows each time.

    Given that does it seem like the design is going to be ok? For the record probably 80% of our customers will be using data sets with <100k rows. And some of the largest datasets are more likely to be read only anyway, where the editing will be more systematic through transformations that happen in other parts of the application and not via a grid directly.
    Friday, April 3, 2009 3:35 PM
  • Well, 100k rows is still a lot. If you know that the largest datasets are going to be readonly, then you could do some kind of parallel table. You might want to consider (and I don't know if this would be faster or not) placing the row position column in a different table. That way you are separating out the real "data" from the UI positional stuff.

    If you absolutely have to have positional ordering (which it sounds like you do) then I can't think of anyway other than using a positional column. Maybe somebody else that's smarter than me can come up with something better.

    Either way, I would test the holy ____ out of this just to see what's going to happen. But <I> would try to limit the amount of complexity in the inserts and updates as much as possible, since my experience shows that you do more of these than you do deletes.


    Chris Snyder, Stumbling through code one day at a time If your question was answered, please mark it.
    Friday, April 3, 2009 3:40 PM
  • I would suggest two things for your app. 

    Manageable Data Storage
    You need to consider using table partitioning in this case to keep your tables of data more manageable.  You will need to be a bit tricky in the design of your partitions as you will not want to partition based on the row position but rather on the set of data that is displayed in your grids.  Each grid would live in one of your partitions.  Account for X amount of rows per grid that you want to display and do not hold more than N number of grids in each table where X * N equals the total number or rows you want to maintain in each partition table.  This equation is a generality in that obviously some of your grids may be larger than the average that you account for and therefore some of your partitions will be larger than you had planned.  But at least they will be close to a maintainable size rather than lumping all rows for all grids into one uncontrollable table that grows exponentially! 

    Fast Data Display
    The next issue that I see you having is the fetching of all this data (even after partitioning) from such large data sets and sorting it among other things.  For this I suggest that you look at a product called Lucene.NET.  This product will allow you to maintain your data in the form of an index which is much faster for retrieving, searching, etc.  As long as your DB app is only edited by your application and the data inserted via your app.  This way you can insert to the database (for reporting, data management, creating new indexes, etc.) and insert to the Lucene index for your app to display, sort, etc. the data in a very fast manner.  Keeping the data in the database will allow you to have the master copy so that if you ever need to recreate your Lucene indexes you can.  When you think of Lucene and indexing think of Google and it's speed.  Lucene is a very fast format for searching and displaying data.
    Andrew Siemer www.andrewsiemer.com blog.andrewsiemer.com www.socialnetworkingin.net
    • Proposed as answer by Andrew Siemer Thursday, April 16, 2009 1:19 AM
    Monday, April 6, 2009 7:16 PM