locked
Best way to save data temporarily before fully saving. RRS feed

  • Question

  • So we have a system that allows people to log in and order magazines.  Management gave us a new initiative to save "in progress" signups when people come back, for example:

    1.  Someone signs into their account.
    2.  They pick 2 magazines they want to subscribe to.
    3.  They leave the website without purchasing.

    Now, we want to save those selections to the datbase so that the next time they sign in, we can show them the magazines that they were in the process of purchasing.  Previously, that sessions worth of data would be lost if they came back.

    Some people on our team have suggested that we create exact copies of all of the related subscription tables to save there and once they PURCHASE the magazines we copy them over to the regular subscription tables and delete them from the copied subscription table. - This seems like a LOT of work to me and not very efficient. Others have said that we keep these "pending" records in the regular subscription table with a status indicator like "In Progress" or something ( as opposed to a "Purchased" status).

     We think that there will be an average of a million records in the copied subscription tables.  With so many inserts and deletes I dont think indexes would help much, but also feel that with potentially a million records, table scans would be SO SLOW.

    Looking for thoughts and ideas with performance in mind!

    Thanks,
    Matt


    Monday, January 16, 2012 2:52 PM

Answers

  • Hi Matt,

    I'd implement the solution with the status indicator column. I'd also add another nullable column InProgressDate smalldatetime and populate it with the time when record has been created. And have some job I run on some interval (weekly?) that deletes all pending (In Progress) record older than X days. For that specific task (deletion) I'd create the index

    -- SQL 2008 - use filtered index
    create nonclustered index IDX_Subscriptions_InProgressData
    on dbo.Subscriptions(InProgressDate)
    where Status = ? /* IN PROGRESS */
    
    -- SQL 2005 - use regular index
    create nonclustered index IDX_Subscriptions_InProgressData
    on dbo.Subscriptions(Status,InProgressDate)
    
    
    


    Million records is not the huge number by any means. But that number does not really matter. You need to create the indexes to optimize your queries - that's what indexes are for :)

    Overhead of index maintenance during data modifications for sure has it's own price and needs to be taken into consideration although in most part of the cases reasonably indexed table should be fine. 

     


    Thank you!

    My blog: http://aboutsqlserver.com

    Monday, January 16, 2012 5:55 PM

All replies

  • Hi Matt,

    I'd implement the solution with the status indicator column. I'd also add another nullable column InProgressDate smalldatetime and populate it with the time when record has been created. And have some job I run on some interval (weekly?) that deletes all pending (In Progress) record older than X days. For that specific task (deletion) I'd create the index

    -- SQL 2008 - use filtered index
    create nonclustered index IDX_Subscriptions_InProgressData
    on dbo.Subscriptions(InProgressDate)
    where Status = ? /* IN PROGRESS */
    
    -- SQL 2005 - use regular index
    create nonclustered index IDX_Subscriptions_InProgressData
    on dbo.Subscriptions(Status,InProgressDate)
    
    
    


    Million records is not the huge number by any means. But that number does not really matter. You need to create the indexes to optimize your queries - that's what indexes are for :)

    Overhead of index maintenance during data modifications for sure has it's own price and needs to be taken into consideration although in most part of the cases reasonably indexed table should be fine. 

     


    Thank you!

    My blog: http://aboutsqlserver.com

    Monday, January 16, 2012 5:55 PM
  • I agree with Dmitri.

    Duplicating a set of tables just because of the state a record is in would be a lot of unneeded work. A simple Status column that's an integer with a look-up table referenced by a FK would be perfect.

    As for performance on a 1 million record table, if the indexes are done right, it should not be an issue. I manage a few tables that have over 8 million records and the query performance is excellent because the indexes were designed correctly.

    The trick is managing the indexes and rebuilding them on a regular basis so they remain useful.

    Cheers,

    Lefka

    Monday, January 16, 2012 8:39 PM
  • Using a second TABLE likely introduces redundant data into the system. Further, it definitely breaks normalization rules. While this does not matter if its "just data", it would not be utilizing the database for what it was designed for, and would likely lead to conflicting data.

    The "In Progress" is an excellent idea to resolve this. Though, i would rename it to reflect the status of the order not the status of the process. That is, instead of "In Progress", i would use NULL, as there is no actual status, just an association.

    Subscription
    --------------
    Client (FK) (PK)
    Periodical (FK) (PK)
    Status (NULL, Ordered, Canceled)

    The INDEX would be the PK, Client/Periodical, which would get used, and checking Status after that would likely be inconsequential.

    Tuesday, January 17, 2012 2:54 PM
    Answerer