locked
SQL SERVER 2005 DATABASE MIRRORING for large number of databases RRS feed

  • Question

  • I am trying to enable database mirroring for 100 database.
    It goes error free till 59 databases (some times 60 databases) with the
    status (principal, synchronized) on principal. on the 60th or 61st database
    it gave the status (principal, disconnected). Also mirror starts acting
    abnormal. connection to mirror starts to give connection timeout and it is
    not enabling database mirroring on any more databases. I have SQL SERVER
    2005 Enterprise with SP1 on the servers. witness is not included yet.

    these are my test servers... i have more than 500 databases on my production
    servers.

    principal and mirror both are using port 5022 for ENDPOINT communication.

    All of the databases are critical and all must be included in the Database Mirroring.
    so, after that I tried to implement database mirroring again......
    System has 3 GB of RAM, SQL SERVER (Mirror) using 85 MB of RAM but still
    giving this error while trying to enable database mirroring for 37th
    Database.....

    "There is insufficient system Memory to run this query"

    WHY?

    Thursday, June 1, 2006 5:35 AM

Answers

  • I doubt that you could use a SQL Server instance that has more than about 2,000 databases. At that point the server would probably not be responsive. You may be able to have close to 30,000 databases if they were all auto-closed and not being used.

    It looks like you are hitting the upper limit of worker threads at around 60 databases. There is no workaround for mirroring. You could try clustering or log-shipping for your scenario.

    Thanks,

    Mark

    Monday, June 19, 2006 3:19 PM
  • I'd be interested in finding out what you were hitting with transactional replication.  I have had a single instance up and running with over 200 subscriptions on it, although I've never tried the other side which would be 200+ publications on a single server.  You aren't going to get all of these running concurrently although 2005 will run a much larger number concurrently along with much better throughput than 2000 will, you would still have to stagger the schedules in order to handle the bandwidth.  I still wouldn't expect a single instance to be able to handle 500+ publications.

    The issue is not with the number of databases that you can put on a SQL Server it is will the number of primaries and secondaries on a single server that you are wanting to architect.  The SQL Server isn't going to have any problem with the databases themselves although you could start hitting rather unique resource starvation issues when you are trying to process queries against hundreds or thousands of databases on a single server.  The issue is that any of the features maintaining a redundant copy of the data all need to allocate resources on a continuing basis that is over and above what a database server is normally going to need with just direct processing in the database.  This is because adding "just one more" into an architecture can see the addition of 2 or 3 more components that causes things to tip over.  In the case of Database Mirroring, for each session you put in place, you are increasing the threads you are using by 2.  Each subsequent one adds very quickly.

    So you look at something like the 500 database scenario you have.  The 500 databases are going to all go on to one server.  But one connection to each database that is processing a query is going to consume 500+ worker threads (the 500 for the queries themselves + some stuff for system processing).  You now add in Database Mirroring (if you could get 500 sessions on a single instance running) and you would be looking at an additional 1000 worker threads needing to be allocated on top of the 500 or so that are already there.  Each worker thread needs memory.  You can keep going down the list of things that are needed.

    Now you consider that you normally have 255 worker threads created by default. more if you increase the value, and you quickly consume all of the available worker threads within SQL Server.  Once all of the worker threads are consumed, things start queuing to wait for a worker thread to be allocated which is now going to create issues with Database Mirroring, replication, log shipping, etc.

    The reason that log shipping and replication have a better opportunity to hit a larger quantity per server is because they don't require threads to continuously check the state.  Those two technologies only care whether something is available when they are kicked off.  Whereas, Database Mirroring proactively monitors.  So, while you have the potential to get 500 replication or log shipping pairs configured on here, it is also unlikely that all 500 of them would be able to run concurrently.

    In all of these scenarios, the first resource bottleneck you are hitting is worker threads.  If worker threads weren't an issue, you would probably see the bottleneck move to the network stack.  If that weren't an issue, you would probably see the bottleneck move to teh processor.  If not the processor, then memory.  If not memory, then disk I/O.  There is going to be a degradation one way or the other as you add databases (a processor can only execute a single query at a time and you can't put 500 of them into a single machine) and then throwing a technology on top of everything to make the data redundant simply accelerates when you hit a resource wall.  It wouldn't matter if you were doing this in Microsoft SQL Server, Sybase, Oracle, DB2, or any other database server, you would still hit a basic wall of "too much stuff" going on in a single database instance.

    Tuesday, June 20, 2006 7:56 AM
  • The reason that article outlines staggering the schedules is for the same reasons as what we're discussing with Database Mirroring.  You're getting deadlocks, because of resource contention.  You're either processor bound or memory bound, most likely both.  That means everything is going to take longer to execute, because everything has to wait for processor and memory to be released by something else before it can execute.

    When you go to work and come home from work, you drive through rush hour.  It takes a lot more time to get to where you are going, because the roads don't have the capacity to handle what is being thrown at them.  The same traffic jam you see on the road is what you get in your database server.

    If you have 8 processors on the machine, you can execute exactly 8 things at any given time, no more.  If you have 60 Database Mirroring sessions, 80 replication sessions, etc., that means that at any given time on the machine, you would have 54 Database Mirroring sessions or 72 replication sessions sitting in the wait queue waiting for a processor to become available.  That means locks are held longer on resources and very simply, the longer locks are held, the more likely a deadlock is to occur.  Just as the longer you have vehicles stacked wall to wall on a 4 lane highway, the worse the traffic jam will get.

    When I scale replication architectures, you have to stagger everything, but it isn't a one at a time stagger.  Depending upon the amount of work, I'd kick off between 8 and 30 replication agents at a time and when they were done, kick off the next batch of agents.  You introduce latency, but the data moves through the system and the latency is lower than if you were using backups.

    There is exactly one way to alleviate this problem and that is to start plugging in more servers and spread the databases out.  This is NOT a matter of the number of databases you can put on a server.  It is a matter of the amount of processor, memory, disk, and network resources that you have on a machine.  If the amount of hardware resources is increased, you are going to be able to do more stuff, but you will still hit a wall, just at a higher number.  500 databases on one machine made fully redundant to a second machine would probably require somewhere in the realm of 64 processors with 32 - 64 GB of RAM and at least 6 - 8 Gbps network ports on the back.  Then you'd need to increase the worker threads to somewhere around 1024.  That's a guess at the level of hardware.  That is simply based on the base level of activity any of the solutions that make the data redundant are going to need, simply because trying to process activity for 500+ databases, you at least have the ability to process up to 64 things at any given time which gets you are bit closer to keeping everything running.

    If every database doesn't have to be made fully redundant in near real time, then you can get away with a lot fewer resources.  The performance will suffer though.

    Even without making these fully redundant, stacking this much stuff on a single machine is going to cause performance to suffer for your customers, because you can only execute as many concurrent queries as you have processors on the machine.  Everything else is going to sit and wait for someone else to finish.

    Wednesday, June 21, 2006 1:17 PM

All replies

  • Each database that is mirrored consumes a couple of worker threads on the principal and the mirror. The system can quickly run out of threads to perform work on the server.

     Although 100s of databases are tested for mirroring at a time, this is mostly a functional test. In production, it is recommended that on the order of 10 databases be mirrored at one time per instance.

    Thanks,

    Mark

    Thursday, June 1, 2006 4:14 PM
  • I again tested database mirroring for 100 databases with 10 databases at one time with a dedicated channel of 1 Gbps and again mirroring ends up on 60th database. What should i do now?

    Tuesday, June 13, 2006 12:13 PM
  • There really aren't any workarounds.  It isn't meant to be mirroring hundreds of databases on a single instance.  You're at 60 databases right now.  That means you are using approximately 120 threads on each of the principal and the mirror + threads for other SQL Server processes.  You could possibly increase the number of worker threads, but you are going to hit another wall.

    The more significant issue is that you are going to have the aggregated transactions for each of these databases + all of the pings between principal, mirror, and witness which occur every couple of seconds.  That is a LOT of traffic being sent across your network and it is VERY sensitive to any delays at all.

    I've gotten 15 databases on a single instance up and running in production with Database Mirroring, but they aren't easy to keep running and we had to do some specific architecture on the network to ensure we had dedicated network links.

    Thursday, June 15, 2006 10:12 PM
  • sql server gives support for 32000+ databases and i can not pass 60 for a full hot backup server with sql server 2005.

    with your last post... i can assume that sql server can only support small number of databases and i have to rely on some other DBMS to support large number of databases.

    in our scenario each database represents a company.... and we can not merge them into a single database. we have to keep individual database for each company. we can not put up 500 machines to support this architecture. 500 machines require 500 more machines as backup servers. also we are still growing we can easily touch 1000 in a few weeks.

    I think i have to look towards oracle for this architecture (have to do some preliminary research as i never worked on oracle).

     

    Monday, June 19, 2006 11:16 AM
  • I doubt that you could use a SQL Server instance that has more than about 2,000 databases. At that point the server would probably not be responsive. You may be able to have close to 30,000 databases if they were all auto-closed and not being used.

    It looks like you are hitting the upper limit of worker threads at around 60 databases. There is no workaround for mirroring. You could try clustering or log-shipping for your scenario.

    Thanks,

    Mark

    Monday, June 19, 2006 3:19 PM
  • I had a terrible experience with transactional replication on sql server 2000 when it ends up on deadlock. I was anxious to see what sql server 2005 brings for us... anyways, I am Thankful to Mark and Michael both to help me in this problem.

    with regards,

    Shariq

    Tuesday, June 20, 2006 5:18 AM
  • I'd be interested in finding out what you were hitting with transactional replication.  I have had a single instance up and running with over 200 subscriptions on it, although I've never tried the other side which would be 200+ publications on a single server.  You aren't going to get all of these running concurrently although 2005 will run a much larger number concurrently along with much better throughput than 2000 will, you would still have to stagger the schedules in order to handle the bandwidth.  I still wouldn't expect a single instance to be able to handle 500+ publications.

    The issue is not with the number of databases that you can put on a SQL Server it is will the number of primaries and secondaries on a single server that you are wanting to architect.  The SQL Server isn't going to have any problem with the databases themselves although you could start hitting rather unique resource starvation issues when you are trying to process queries against hundreds or thousands of databases on a single server.  The issue is that any of the features maintaining a redundant copy of the data all need to allocate resources on a continuing basis that is over and above what a database server is normally going to need with just direct processing in the database.  This is because adding "just one more" into an architecture can see the addition of 2 or 3 more components that causes things to tip over.  In the case of Database Mirroring, for each session you put in place, you are increasing the threads you are using by 2.  Each subsequent one adds very quickly.

    So you look at something like the 500 database scenario you have.  The 500 databases are going to all go on to one server.  But one connection to each database that is processing a query is going to consume 500+ worker threads (the 500 for the queries themselves + some stuff for system processing).  You now add in Database Mirroring (if you could get 500 sessions on a single instance running) and you would be looking at an additional 1000 worker threads needing to be allocated on top of the 500 or so that are already there.  Each worker thread needs memory.  You can keep going down the list of things that are needed.

    Now you consider that you normally have 255 worker threads created by default. more if you increase the value, and you quickly consume all of the available worker threads within SQL Server.  Once all of the worker threads are consumed, things start queuing to wait for a worker thread to be allocated which is now going to create issues with Database Mirroring, replication, log shipping, etc.

    The reason that log shipping and replication have a better opportunity to hit a larger quantity per server is because they don't require threads to continuously check the state.  Those two technologies only care whether something is available when they are kicked off.  Whereas, Database Mirroring proactively monitors.  So, while you have the potential to get 500 replication or log shipping pairs configured on here, it is also unlikely that all 500 of them would be able to run concurrently.

    In all of these scenarios, the first resource bottleneck you are hitting is worker threads.  If worker threads weren't an issue, you would probably see the bottleneck move to the network stack.  If that weren't an issue, you would probably see the bottleneck move to teh processor.  If not the processor, then memory.  If not memory, then disk I/O.  There is going to be a degradation one way or the other as you add databases (a processor can only execute a single query at a time and you can't put 500 of them into a single machine) and then throwing a technology on top of everything to make the data redundant simply accelerates when you hit a resource wall.  It wouldn't matter if you were doing this in Microsoft SQL Server, Sybase, Oracle, DB2, or any other database server, you would still hit a basic wall of "too much stuff" going on in a single database instance.

    Tuesday, June 20, 2006 7:56 AM
  • my production server is running sql server 2000. i tried to enable transactional replication but end up on the 80th databases as sql server starts giving deadlock errors.... further research bring me to this knowledge base article

    http://support.microsoft.com/?kbid=246330

    even if i run one agent (lets say snapshot agent) per minute per database, it will only cover 1440 databases in 24 hours and it wont be a hot backup server.

    Currently, a job runs in the middle of the night which takes the backup of all the databases, second job copies them to the backup server and 3rd job restore them on the backup server (just like log shipping with the difference that the backup server can become production server with little effort). But I surely loose valuable amount of data in case of disaster. so, I have a few critical databases under transactional replication and the rest are backed-up under the above procedure.

    All my efforts are in the direction of having a hot backup server on sql server for large number of databases.

    Tuesday, June 20, 2006 12:40 PM
  • The reason that article outlines staggering the schedules is for the same reasons as what we're discussing with Database Mirroring.  You're getting deadlocks, because of resource contention.  You're either processor bound or memory bound, most likely both.  That means everything is going to take longer to execute, because everything has to wait for processor and memory to be released by something else before it can execute.

    When you go to work and come home from work, you drive through rush hour.  It takes a lot more time to get to where you are going, because the roads don't have the capacity to handle what is being thrown at them.  The same traffic jam you see on the road is what you get in your database server.

    If you have 8 processors on the machine, you can execute exactly 8 things at any given time, no more.  If you have 60 Database Mirroring sessions, 80 replication sessions, etc., that means that at any given time on the machine, you would have 54 Database Mirroring sessions or 72 replication sessions sitting in the wait queue waiting for a processor to become available.  That means locks are held longer on resources and very simply, the longer locks are held, the more likely a deadlock is to occur.  Just as the longer you have vehicles stacked wall to wall on a 4 lane highway, the worse the traffic jam will get.

    When I scale replication architectures, you have to stagger everything, but it isn't a one at a time stagger.  Depending upon the amount of work, I'd kick off between 8 and 30 replication agents at a time and when they were done, kick off the next batch of agents.  You introduce latency, but the data moves through the system and the latency is lower than if you were using backups.

    There is exactly one way to alleviate this problem and that is to start plugging in more servers and spread the databases out.  This is NOT a matter of the number of databases you can put on a server.  It is a matter of the amount of processor, memory, disk, and network resources that you have on a machine.  If the amount of hardware resources is increased, you are going to be able to do more stuff, but you will still hit a wall, just at a higher number.  500 databases on one machine made fully redundant to a second machine would probably require somewhere in the realm of 64 processors with 32 - 64 GB of RAM and at least 6 - 8 Gbps network ports on the back.  Then you'd need to increase the worker threads to somewhere around 1024.  That's a guess at the level of hardware.  That is simply based on the base level of activity any of the solutions that make the data redundant are going to need, simply because trying to process activity for 500+ databases, you at least have the ability to process up to 64 things at any given time which gets you are bit closer to keeping everything running.

    If every database doesn't have to be made fully redundant in near real time, then you can get away with a lot fewer resources.  The performance will suffer though.

    Even without making these fully redundant, stacking this much stuff on a single machine is going to cause performance to suffer for your customers, because you can only execute as many concurrent queries as you have processors on the machine.  Everything else is going to sit and wait for someone else to finish.

    Wednesday, June 21, 2006 1:17 PM
  • ok, can you please tell me what kind of danger we will face if some how we manage to merge all the databases into a single or couple of database? one i can think of is that we can run out of harddisk space very easily. second is again of memory and more memory would be required. third is again processor as a single processor can not handle the number of queries from thousands of customers and 4th would be of the worker threads..... so we will be back to the starting point again.... now we cant have large number of databases and we cant have a large database either.

    the only solution remain is to put up a number of machines to handle each database.

    Thursday, June 22, 2006 12:22 PM
  • I wouldn't be qualified to recommend combining them into a single database.  Does your business even allow data from multiple companies to be co-mingled in a single database?  Do your customers even allow it to be co-mingled?

    Putting them all in a single database does not necessarily mean you can run out of disk space more easily.  You would have the same aggregation of space consumed in a single database and you would in multiple databases.

    Your memory requirements would actually drop significantly.  SQL Server allocates memory to each open database, so going from 500 databases to 1 database means that the memory consumed by 499 of the open databases would no longer need to be allocated.  Additionally, if these databases are all the same, then you would also save significant space in the query cache.  If you have a proc named myproc in Database1 and the same proc in Database2, then you will have space consumed in the query cache for the query plan and execution plan for Database1.schema.myproc AND Database2.schema.myproc, because they are different objects.  Whereas if everything is in a single database, there would only be the query plan and execution plan for myproc in the query cache, thus reclaiming a significant amount of memory.

    If a single processor can't handle the queries from thousands of customers, then it does not matter whether you have 1 database containing everything or the data split into 500 databases, the processor still isn't going to be able to handle the requests.  So the whole idea of putting 500 databases on a single machine is a moot point, because the machine couldn't handle the load anyway.

    No, worker threads probably are not going to be an issue as a single database.  Because SQL Server would swap those worker threads around for executing queries instead of chewing them up for hundreds of Database Mirroring connections.  There are systems out there with tens of thousands of concurrent users running just fine on 255 worker threads.  BUT, each Database Mirroring session is going to require the use of a worker thread which is MUCH different than what occurs when they are used to service query requests.

    The more posts that come out here, the less this is making sense to me.  This started with trying to get Database Mirroring to run 500 sessions in HA mode on a single machine for principal and a single machine for mirror with presumably a single machine for witness.  However, you're describing an environment ("a single processor can not handle the number of queries from thousands of customers") that tells me this configuration wouldn't have even worked without Database Mirroring and you would need to do exactly what we've suggested anyway - split the databases on to multiple machines.

    Sunday, June 25, 2006 3:21 PM
  • Are you using Epicore accounting software ?

    Try using XOSofts WANSYNC which uses blcok synchronization. This worked well for 200 + databases.

    http://www.xosoft.com/products/f_WANSync.shtml

    regards

     

    Thursday, October 19, 2006 12:18 AM
  • Are the MS tests for database mirroring posted any where? For which configurations does MS recommend only mirroring 10 databases? Does MS recommend a limit of 20 databases for 64 bit servers?

    Inquiring minds want to know.

    Wednesday, October 25, 2006 8:42 PM
  • I once thought that SQL Server support 30,000+ databases simply because of the data type in Master.SYSDatabases.

    I found out the hard way, "IT DOES NOT". In fact, I ran a test to find out, mounting multiple instances of a SMALL database. I stopped around 2,000, which was way over an esoteric article I found on TechNet where MS recommends no more that 1,100 databases without special SQL Startup Switches, etc.

    The short story, you are trying to use Mirroring at a level it was not designed to implement. No other DBMS will do any better.

    If you need to have a separate database for every customer (a valid requirement) and you need rapid failover then your easiest option is clustering. Replication will work, but it changes the schema, and is harder to understand.

    The down side of Clustering is the shared disk drives...so you will need a way to establish redundancy for that as well.

    If you don't want to do the Clustering or Replication route, you other solution would be some form of third party product, such as Never Fail.

    Cheers,

    Ben

    Friday, November 3, 2006 6:46 PM
  • That is not accurate.  Replication doesn't change the schema.  Snapshot replication runs doesn't require any schema changes at all.  Transactional requires a primary key before the table can be replicated, but it does not change the table structure.  Any other replication option has additional requirements such as adding a column to a table in order to work, but the DBA has to make those decisions.  Replication is no more difficult to understand than Log Shipping or Database Mirroring.  All three take a copy of the data from a primary and send it to a secondary.  The precise mechanism is just slightly different among the three, it isn't any more complicated.
    Sunday, November 5, 2006 6:08 PM
  • what we are doing now is to merge all the databases into a single database.... separate databases for each customer was giving us an edge in having a physical partition between customer's data... we achieved that using Inline Table-valued Functions which takes customer id as input and will give access to the data of that particular customer. Infact it is more than a single database approach (in case we hit another limitation), we can set number of customers for a single database, it could be one database per 'N' customer(s) where 'N' is from 1 to ALL.

    In this way we will hardly hit any wall as discussed in the previous posts and we will achieve our goal i.e. a hot backup server for our database server.

     

    Monday, November 13, 2006 5:28 AM