I've been reading a lot about federations recently and they sound like a fantastic concept, but I can't see a huge benefit of using the built-in version in SQL Azure vs rolling your own in my particular case.
I run a moderately sized multi-tenent mission-critical system on a combination of SQL Azure and Azure tables. The biggest hundles I've come across as the business has expanded are capacity (largely dealt with by the new maximum size increases) and throttling.
The database comprises of just over a hundred tables and tens of millions of rows and prior to splitting was around 40GB in size.
From what I understand, here are the benefits of federating my database:
-Scale out: my data is spread across multiple "servers" and therefore should be faster to query like partitioning in SQL Server and I should get throttled less as I'm on mutiple VMs
There are a few missing missing features in the inital release:
- Ability to use the database during a split - I learnt this watching this video:
http://channel9.msdn.com/Events/TechDays/Techdays-2012-the-Netherlands/2295 around 25m 05s
- Identity columns are not allowed
- Reference data must be manually updated on each federation member
- Database names are guids and therefore difficult to use directly - I'll explain why this is useful later
- You have to programmatically manage the "USE" statement to work out which federation member to query
- Tools like SSDT, import/export do not support federations
I've used my own version of federations which essentially involves manually splitting the database and even putting the database shards in different datacentres. ie All our US clients are in one datacentre and our EU clients are in a European datacentre.
This seems to have the following benefits over using built-in federations:
- I can use identity columns
- Database names can be meaningful - I've used the example of US vs EU, but I could name the database according to the name of the "pool" of clients I'm splitting out
- I can use great tools like import/export, SSDT, etc that don't support federations
I understand that federations are designed for a number of use cases that may not be similar to mine, but I see a huge benefit to using the concept of federations. It just feels like there aren't enough benefits over doing it manually.I can just create a
new database using the cookie-cutter schema and roll it out and keep it syncronised using SSDT. I have to say that I would never have tried doing this had SSDT not existed, it's awesome for keeping things synced and I can't imagine life without it!
Am I missing something? Are the imminent updates that are due that will change things? Has anyone with a similar setup gone down the route of federations and found them to be very suitable?
You are making good arguments for rolling your own shard, and these points are even stronger due to the fact that Federations is a limited feature at this time. However, Federations is still a good place to be for many reasons. Here are a few:
- When you split a database, the newly create federation member comes the database schema from the source database, including the security configuration. This means you don't have to worry about running DDL statements when you split.
- Although Bob Beauchemin is having problems connecting during the SPLIT operation, according to the video you are referencing, it is still an advertised feature and should be working at some point. Being able to SPLIT online is one of the major advantages
- A feature that did not make it yet, but will at some point, is the MERGE operation, allowing you to scale your system down. This is also another important aspect of scalability.
- Federation is not an all or nothing approach. You can choose to federate just the few tables that are relevant. For example, you can federate the CustomerSales table, along with its referenced tables. However all the other tables can remain in the root
database. This means that if you can federate just a few tables, you have a system that should be easier to manage overall since most tables are in the same database.
- Federations allows you to have a granular sharding approach. So if you decide that you only need 2 federation members to carry 1,000 customers, you only manage those 2 databases, plus the root. Compare this to a linear sharding model in which each customer
has its own database (or database/schema)... you would have to manage 1,000 databases/schemas.
Your point about database names being meaningless is probably irrelevant, at least from a programmatic standpoint, since you do not want to connect to the database directly (see connection pooling above). Federations can have meaningful names,
and that's all that should matter. Although your point is still valid to a certain degree because most management tools are unaware of Federations, so viewing database names as GUIDs is hard if you are looking at a list.
Herve Roggero, MVP SQL Azure Co-Author: Pro SQL Azure http://www.herveroggero.com
Thanks for the detailed response, your answers have really helped clarify things in my mind.
The connection pooling blog post from Cihan is a pretty strong argument for using built-in federations if you have lots of shards, hopefully some of the other issues will be solved by the time we get to the number of shards he mentions!
From my perspective, here are the deal-breakers:
- Online split: If you currently can't split online this means your database goes down. My database is mission-critical, so this is not an option. The way I got round this issue during manual sharding was to bcp out the data while all apps that updated data
in that shard were offline. Luckily for me the mission-critical apps just read data, so this was okay. Once the data had been imported, I just switched the connectionstring of the read app, started the update apps with a new dynamic connectionstring and deleted
the data from the old shard slowly in the background.
- Support for copy database and import/export API: Every day we make a copy of the live database using "AS COPY OF" and then create a bacpac in blob storage of that copy using the Export REST API. I believe this is best practice for creating a transactionally
consistent backup that doesn't create problems with throttling on your primary database during the bacpac creation as the operation is happening on the copy. Can I do this with federations? Does database copy and the Export REST API support federations? If
not, could I "trick" these services into working by performing them on the federation members?
- Support for SSDT: Despite this being a new tool, I can't live my life without it. I use it to sync Schema and SPROCs between my manual federations. I understand it doesn't support federations either, but could I do the same trick as above? ie attach to
federation members separately?
Here are the important-to-haves:
- Renamable federation memebers: Now I understand that there are use cases where a ticketing company needs to spin up a 500 SQL Azure databases and maybe manual management of a federation that large is not practical, but surely there will be many companies
like mine where the shards are more "stable". I envisage adding shards when we add new clients and moderately rarely splitting existing shards. In this case it would be handy to be able to name the shard to make it easier to work with in tools like SSDT that
don't support federations?
- Geo-dispersed shards: We have clients both sides of the pond. Our shards reside on the closest datacentre to the client.
You mention that some tables can remain in the root database. Is this database special in any way? ie Can you do joins from a table in the root to a federation member table?
Also, when I do decide to start using federations is there an easy way to "convert" my shards into a federation?
Thanks very much for the answers so far, any more to the questions above would be much appreciated!
- Online Split: If I were you I would try it just in case. Perhaps Bob was having specific problems that do not apply to you.
- AS OF COPY: Yes - this is a problem if you want the ability to get backups that are transactionally consistent. However, you mentioned that the critical app is read-only, so you could in theory get something equivalent if whatever populates the
data does not run when the backup runs, and during the "low tide" of users to minimize throttling. Not a perfect solution; definitively a workaround; not a perfect solution. The current backup/export tools of Microsoft do not yet support federations; you could
use a third party tool to get you backups in blobs. But backing up is just half the story... restoring whatever you backup into a federation can be very challenging.
- SSDT: Nope... you can't use it with Federations as far as I know. But you wouldn't need it as much because you would have far fewer databases, and far fewer objects to maintain in sync. How many partitions would you need, and how many tables would be federated
is the question.
The root database isn't special in any way, other than it holds the information of all the federations and its members. It also holds any table you do not want to place in federation members. So that database supports unique identity and everything
else SQL Azure supports.
Converting is an interesting problem. It depends on how you are sharding today and how your application is designed. I can't really tell you much, other than it depends... :)
Considering your needs today and some of the limitations of federations, I would say that you may want to investigate and try federations to understand the full impact, and determine if the limitations are really showstoppers for you.
Here are a couple of blogs that may be of interest: