none
High speed an availability query system RRS feed

  • Question

  • Hi folks!

     

    I have some experience designing and building traditional normalized systems (business systems to maintain entities and its CRUD operations) and now i have a new challenge ahead. My team is working in a project of an e-commerce application wich will need high availability and query speed. We're talking in querying about over 3 millions rows in database with lots of relations and more than 100 thousand users per day.

     

    We've built backend system to manage all products that will be avaliable at the website using normalized database and

    now we're going to start website development. I did some research and found that i have to denormalize this products database with some ETL process in order to be able to make fast searches.

     

    Other requirements are: ordering columns, grouping them and lots of filters.

     

    The problem is that i have no idea of how do it, so i need help from the more experienced guys in here.

     

    Thank you for all help!

     

    ps: sorry for my poor english

    Wednesday, April 16, 2008 4:23 PM

Answers

  •  

    Hi

     

    like said before you can do a few things, off the top of my head:

     

    database level

    - split read and write databases (sql server 2005 has a number of features for this)

    - create denormalized databases for reporting

    - create indexed views for the most common of queries

    - create a seperate db for etl purposes, than update the write db.

     

    application level

    - use caching of the results + caching of asp.net pages

    - don't use viewstate

    - application state in sql server

    - use connection pooling for the db

     

    physical level

    - cluster web & db servers

    - non sticky load balancing

     

    Hope this helps you out

    Thursday, April 17, 2008 9:04 AM
  • There quite a nice article in the latest MSDN mag covering some of this if you get that, I don't think it's online yet.

     

    Thursday, April 17, 2008 10:19 AM

All replies

  • How stable is the data? E.g. can you have an Amazon style model where lots of the stable read data can be farmed out?

     

     

    Wednesday, April 16, 2008 5:06 PM
  • Hi,

    once my data is inserted it will be avaliable during a long time. I'll have a high volume of inserts and some modification (attributes included, pictures, product relation).

    Thank you Big Smile
    Wednesday, April 16, 2008 6:22 PM
  • The basic principal is to physically separate read and write databases. That way writes aren't blocked by reads and vice versa. Plus you can have more dedicated read databases and have them closer to the client. Of course you don't get that for free and you'll need to use some mechanism to sync the data, plus you have to live with a potential no-mans land window where the inserted data isn't available. On the plus side such a lag can work for you, architecturally since you can employ a queuing system...again that depends on how the clients expections, e.g. are the inserts in a transaction, can they compensate if a insert/update eventually fails. The model has a number of assumptions so it may not help in your specific instance.


    Wednesday, April 16, 2008 10:01 PM


  • You need to consider historical data, i.e. data that is in only ever going to be read (once an order is completed you're not likely to use it again) You also need to consider if there is a need to move that data from the transactional database to a warehouse, so that old used and likely never looked at data does not affect newly created data.

    There is going to be a large amount of indexing, so you'd have to consider if there is a need for a number of specialized 'views' of the data, and then perhaps consider creating database objects to support those specifically (obviously what you choose depends upon the database product, and how it performs)

    I would expect the ETL to be simply taking the transactional highly normalised 'active' database and pushing it out into formats that are more applicable to reports, and thus are likely to be denormalised, thus reducing the cost of joins across tables and so on.

    How exactly you would approach this would depends upon how the system has to work, where the data becomes 'inactive' or at least is unlikely to change.

    The major differences to descriminate, is to transactional versus Decision Support, or reporting type databases.  You probably need to do some research on how these impact any proposed system, then research how this will affect your proposed system.  Then come back on here and ask more questions if you have any.


    I hope this helps,

    Martin Platt.
    Wednesday, April 16, 2008 11:30 PM
  •  

    Hi

     

    like said before you can do a few things, off the top of my head:

     

    database level

    - split read and write databases (sql server 2005 has a number of features for this)

    - create denormalized databases for reporting

    - create indexed views for the most common of queries

    - create a seperate db for etl purposes, than update the write db.

     

    application level

    - use caching of the results + caching of asp.net pages

    - don't use viewstate

    - application state in sql server

    - use connection pooling for the db

     

    physical level

    - cluster web & db servers

    - non sticky load balancing

     

    Hope this helps you out

    Thursday, April 17, 2008 9:04 AM
  • There quite a nice article in the latest MSDN mag covering some of this if you get that, I don't think it's online yet.

     

    Thursday, April 17, 2008 10:19 AM