How to design SQL Server 2005 scablable Architecture that supports 100,000 hits at a time RRS feed

  • Question

  • Hi,
    I have read lot of articles on SQL Server 2005 clustering and find out that the SQL Server 2005 doesn't support Active/Active clusters out of the box. But some techniques exist like Data Dependency Routing using Distributed partition views etc.

    Actually, i want to design architecture of a highly scalable web site that supports 100,000 hits simultaneously. For this i am planning to create a server farm  (3 Web Servers) and will use NLB (Network Load Balancer) service of Windows 2003.

    But as you know, databases always bottle neck when you talk about high availability & scalability so i want to know what is the recommended architecture of Database to support this design?

    How can i distribute database on multiple server (like 3 DB Servers) and allow load balancing between them?
    What is the recommended web server routing technique? (NLB VS any other load balancer like Foundry ServerIron Load Balancer) and what is the advantages of using NLB?


    Wednesday, April 22, 2009 9:48 AM

All replies

  • Anybody here to answer my question please....
    Thursday, April 23, 2009 7:18 AM
  • There are a couple of paths that you can take.  You can indeed have separated servers to hold location specific data.  Take a bank application for example.  When you log in you specify the username and state that you live in.  This allows them to know exactly which set of data to work with when considering your requests.  This is a form of partitioning.  Partitioning can happen at the table level, the database level, or the database server level depending on how you want to implement it.  Partitioning at the table level from 2005 and up made things a lot easier.  This can be a considerable topic to look at but the basics are if the state is california look into california accounts table, database, or server.  Obviously you will have some aspects that need to be shared across all accounts so this data will need to be managed as a single silo. 

    Also, there is more to database performance than simply partitioning the data appropriately.  An easy way to take more hits to your site is to cache data in a cache server(s).  Look at a product called MemCached Win32 which allows you to have a farm of servers that strictly manage cache.  The way this works is that the first time (fairly static) data is requested by your application it is placed into cache.  Then when that data is requested the next time you dont' have to go to the database but instead go to the cache location to retrieve it.  The benefit here is not only skipping the dip to the database server.  You can get raw data from the db, then transform it appropriately and build what ever output you need, then cache that output.  This means that there is no db dip and there is no work done on the data as that was done previously. 

    Another thing to look at with regards to the db is when running complex queries on the db.  Instead of querying your large data store constantly you might consider moving the data to a Lucene.NET index or indexs.  This is data that is optimized for searching (think google) and read capabilities.  So any data that needs to be searched through and displayed to your users consider running it through an indexing process and stored to the indexes.  Any time the data is modified in your db you will of course need to update your indexes.  Or you can also just use the index for searching capabilitie to identify the appropriate record(s) for display and then go the db and get the records by their keys rather than a free form text search.

    Lots can be done to achieve 100,000 hits!
    Andrew Siemer
    Thursday, April 23, 2009 6:14 PM
  • Thanks a lot for the detail reply, But your answers are very general. Actually i want is to know how many servers needed to have a high availability federated database and what should be the OS (MS Windows 2003 OR 2008). What are the best practices.

    I have found the following high availability and scalability techniques of load balancing in SQL Server 2005 Databases.

    1. Distributed Partition Views:
    Data Dependent Routing is a technique that can be achievable by Distributed Partition Views . Distributed partition means a Table Partition on a remote server that is connected through linked server. Tables participating in the Distributed Partition View reside in different databases which reside on different servers or different instances.

    Steps Involved:
    a) Create same database on multiple servers
    b) Create the tables that needs to be partition in all server with a constraint.
    c) Create Distributed Partition Views in all tables for Select, Insert, Delete and Update statements


    2. Transactional Replication (Microsoft Architecture):
                    In this type of scenario we need to create three or more databases with similar schema and each database will connected to the one of the web server and transactional replication will be setup          between them to synchronize the data among all database servers. This type of load balancing give enough performance boost when load increase.

                    Steps Involved:
    a) Make a same database copy on three database servers.
    b) Implement a Transactional replication between them to synchronize data.
    c) Create a composite key in all tables where we are using auto increment identity column
    When tables contain identity columns that are configured as primary keys, there are considerations that must be addressed for both load balancing and failover scenarios. For example, when a row from a Publisher table with an identity value is replicated to a Subscriber table, and the Not For Replication property is set on that column at the Publisher, the row values are inserted in the Subscriber table but the insert does not increment the identity value. The result is that when an application tries to add a row directly to the Subscriber table (which would now be the Publisher) it fails because there is already an entry in the identity column with the next identity value because it was replicated from the original Publisher.

    Identity columns can be addressed in two ways.

    First, a new column can be created in a table that has an identity column. This column is an integer data type (adding 4 bytes to the row) and defaults to the specific server ID where that instance of the database resides. For example, if the server ID of the first node is 1, the second is 2, and so on, the new column on the first node would default to 1, the new column on the second node would default to 2, and so forth. This new column, or SRVID column, and the identity column are configured as a composite primary key. This effectively ensures a unique key value on each node of the cluster and mitigates the duplicate key violation issue.

    The second option is to seed each identity value on each node with its own range of values. This assures that each identity value remains unique across the four instances of the database. To reduce the maintenance of this approach, the identity columns are created as a bigint data type. We use bigint so we have enough overhead to create large ranges (a billion) that require less reseeding. For example, the range for node one starts at 1, the node two range starts at 1,000,000,001, the node three range starts at 2,000,000,001, and the range for node four starts at 3,000,000,001. This approach also adds 4 bytes to the row because bigint takes up 8 bytes instead of the 4 bytes taken up by the integer data type.

    Reference Article:

    In the reference of above facts i need to know the recommended Architecture of SQL Server databases. Please feel free to give feedback.



    Friday, April 24, 2009 4:49 AM
  • Hi

    Can you tell me what is your exact requriements for your architecture for your database ??

    Can you please be more specific ??

    Phijo Mathew Philip.

    Tuesday, April 28, 2009 2:16 AM
  • Hi,

    You are talking of high availability which can be achieved by active passive mode.
    If you are interested in scale out, mechanisms are already mentioned by you.
    I have just finished doing architecture for an e-learning application whose concurrent users are 50000.That are pritty high numbers to support by SQL Server.
    BUt i figured out that for medium complex user scenrios(medium complex queries), one database server with another in passive mode will support.
    The things(bottlenecks) which you have to take care are:
    1. CPU bottleneck: Take the highest configuration server for setting up database.Now a days ,we have 4*6 core intel/opteron based server available from different vendors like Sun/HP/Dell.If needed, i can send you the exact specifications
    2. Network bottleneck: Network can be a bottlenect if the packets interchanged between webservers and database server are blocking your datacenter network.For this, use different subnets for connecting webservers and database server.You can also use virtual dedicated network between server(latest offering by some vendors).Try to upgrade your datacenter network speed to 1 Gbps or if possible 10 Gbps.
    3. RAM bottleneck: I  think there should be no issues in this

    Important: Believe me, you have many way arounds in SQL Server for scaling out but as it is not natively supported by SQL server, people generally land up in maintainance issues at later stages.So better go for scale up.

    Hope this helps

    Please mark this post as Answer if it helps you
    Tuesday, April 28, 2009 3:28 AM