none
Designing Databases for SOA RRS feed

  • Question

  • My question relates to best practices in database design for SOA. Specifically I am looking for a guideline that will help me maximize performance and guarantee uniqueness in and SOA environment.

    Is it generally advisable to use surrogate keys in your tables if you are designing the data store for an entity service? If so, what datatype should you use, integer, GUID?

    I would expect that if I use surrogate keys, that they will then be used by consumers as reference data in other services.

    All ideas are welcome.

    Thanks, Jeremy

    Thursday, December 29, 2005 3:57 PM

Answers

All replies

  • Hi Jeremy, not exactly the answer to your question, but a good article surfacing important aspects of data handling in a SOA context can be found here. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/dngrfSOAChallenges-EntityAggregation.asp 

    hope it (partially) helps

    --gp

    http://blogs.msdn.com/gianpaolo

    Friday, December 30, 2005 8:37 AM
  • I usually suggest using surrogate keys (unless it is a junction table supporting many-to-many relationships).  You need the key for technical purposes, not business ones.  All other fields in tables are normally needed for business purposes. Therefore, creating a surrogate key will not cause a conflict with the business nature of other fields. (For example, if you later decide to change a field for a business reason, you will never touch surrogate keys).

    Using int identity field is more convenient than using GUIDs. However,  there is a case (at least one) when using GUIDs is necessary. This is the case of merge replication.  You have to provide uniqueness of keys in two tables; and if you use integers, you will end up creating ranges for keys in different tables. That's not just a pain in the neck, but may not work if the number of records is getting bigger.

    --AK

    Thursday, January 5, 2006 1:13 AM
  • Surrogate keys are used for many different purposes and IMO, it is advisable to keep a surrogate key (usually DB_INT type should suffice). 

    Surrogate keys will help you in number of different scenarios that includes

    a) ability to store history of changes. In fact, data warehouse 101 teaches us that adding surrogate key to fact and dimension table is one way by which you can keep change history. If your operational database is designed for maintaining historical information (using surrogate keys), this would make datawarehouse design to be much easier.

    b) Surrogate keys can double up as alternate key. Alternate keys are helpful in some scenarios.

    Guid vs Int

    IMO, one of the desired property of the key is that it should be partitionable. Guid is hard to partition, as it is hard to write partition function that can uniformly distribute Guids across different buckets. On the other hand, simple hashing scheme would work for integer.

    BTW, i am not sure how surrogate keys are related to SOA . :)

    Thursday, January 5, 2006 9:58 AM
  • Let me paint in a bit more detail.

    Say I am creating a customer service that several other services will need to work with. There is a customer entity that will roughly translate to a table in the database.

    When these other services, say ShoppingCart, needs to link a purchase to a particular customer, that service MUST ensure that it is referencing the correct customer. As we all know there can be multiple people with the same FirstName and LastName, and even duplicates in the same company.

    The simple thing to do is to issue the customer a customerID that is guaranteed unique and use it as 'reference data' in the Shoping Cart service.

    The customerID becomes the surrogate key and (in this scenario) is used by the ShoppingCart to reference to the correct customer in the CustomerSevice. In this way I am using the surrogate key in a database as the 'key' to my customer data in the service.

    What do you think?

    Thursday, January 5, 2006 3:29 PM
  • If you are strict in your SOA implementation the Shopping Cart database (owned by the shopping cart service) would be separate from the Customer database (owned by the customer service)

    This means that the unique id for the customer is some ID supplies by the contract provided by the Customer Service -which may or may not be identical to the ID used in the Customer DB. For example if the Customer DB is versioned, its primary key may be more volatile and different from the customer ID.

    Thus, the customer ID stored in the Shopping Cart database would not be a surrogate key in the strict database sense

     

    Arnon

    Thursday, January 5, 2006 4:06 PM
  • I agree with pretty much all of the other posters about the surrogate keys - the approach you pick has little to do with SOA. 

    We have done this a few ways depending on the situation.  In general we have kept the "visible" key in the actual PK declaration, in some cases along with a data context identifier and/or version ID.  There was never a compelling enough case to not do this, since most operations need it anyway and it makes RI more straightforward. 

    For a version ID, we've done it all, GUIDS, ints, whatever.  GUIDS make sense technically because callers or the server can create them "blindly" (not having to know any previously assigned values), depending on your needs.  But our consultants and some customers find them hard to read when diagnosing issues or building integrations. 

    One approach to a versioning ID I like is to encode a timestamp into an 18-digit, 64-bit integer: YYYYMMDDHHMMSSDDDD ("D" digits are fractions of seconds).  It can be stored as a 64-bit int, values can be compared mathematically and they are human readable.  You just have to make sure you pick a single timezone (like Zulu).

    But SOA is more about breaking the application down into smaller and more discrete parts so that business processes can be strung together in new (possibly unforeseen) ways over time.  So I would advise you to spend a lot of time thinking about what impact this has on the data domain.  A good place to start is the classic Pat Helland exercise of breaking the data domain into categories of resource, activity, and reference data. 

    You also need to figure out what sort of data context structure is needed.  For example, we have a database that holds activity transactions (like orders) for multiple companies, each of which can decide whather to use a shared set of resources (like parts and customers) or use their own.  Finally, how activities drive one another and how those processes can be redefined later can affect how you decide to declare foreign key references and joins within queries.

    These are the issue I'm looking forward to discussing on these new forums! 

     

    Erik Johnson
    Epicor Software

    Monday, January 9, 2006 4:52 PM
  • Great thread!  I would tend to agree with Erik Johnson's post here at the end.  As I read through all of the comments I couldn't help but wonder what your database design had to do with SOA.  SOA is truly an enterprise architecture strategy for building reusable loosley coupled "building block" code modules.  Designing services in an SOA should always be done while considering the potential consumers a service will have and in this case I don't think the persistence decisions are all that critical.  I honestly don't care if you store records in a comma delimited text file if that makes the most sense. 

    With that said,  enterprise services probably have to stand up to as much if not more design scrutiny because of their percieved implicit reusable nature.  If you don't design a service that is scalable and maintainable then you're likely to suffer when it comes to broad adoption of that service because of the pain you will be causing when trying to version or aggregate new solutions using that service.

    That's my 2 cents on the SOA side,  I would also agree with one other comment in Erik's post above and that is about the key's being human readable.  The idea of leveraging surrogate keys is a lot of times for ease of use and some believed performance benefit (which developers tend to be too hung up on if you ask my opinion).  I think if you have fairly straight forward PK/FK relationships in your tables then very often you can avoid consuming additional disk by keeping some of those straight forward lookups right in your primary tables.  You also reduce complexity in your applications by eliminating un-necessary join statements. 

    Of course this all tends to be a case by case scenario and I'm not sure I fully understand the database design issue at hand here.  I guess the main point of my post is that database design and delivering to a service oriented architecture are certainly mutually exclusive decisions and if you think you must use surrogate keys just be thinking about some possible alternatives especially for trivial look ups.

    Hope this is helpful in some way.

    Tuesday, January 10, 2006 12:06 PM
  •  Jeremy Maynor wrote:
    Say I am creating a customer service that several other services will need to work with. There is a customer entity that will roughly translate to a table in the database.

    When these other services, say ShoppingCart, needs to link a purchase to a particular customer, that service MUST ensure that it is referencing the correct customer. As we all know there can be multiple people with the same FirstName and LastName, and even duplicates in the same company.

    The simple thing to do is to issue the customer a customerID that is guaranteed unique and use it as 'reference data' in the Shoping Cart service.

    The customerID becomes the surrogate key and (in this scenario) is used by the ShoppingCart to reference to the correct customer in the CustomerSevice. In this way I am using the surrogate key in a database as the 'key' to my customer data in the service.

    What do you think?

    Sorry to deviate from the topic of the thread..

    Hi Jeremy, talking about your example, are you suggesting  a SOA customer purchasing scenario with shopping cart? If yes, you are suggesting that shopping cart is provided as a service?

    I am just curious, how can a shopping cart implemented as a service? more accurately why would a shopping cart be implemented as a service?

    I would appreciate if you could elaborate on this example...

    sorry guys for deviating from the thread..

    Thanks :)

    Monday, January 23, 2006 6:04 AM
  • Hi Jeremy,

    In the starting process of investigating SOA for our company, I am dealing with the same issue. The fact that nobody can give a simple straightforward solution proves that it is a tricky subject of SOA architecture. I don't understand the reactions saying that this issue of SOA design has nothing to do with database design. If you need a link between information in different databases both databases need to contain the linking key.

    One consultant of us is advicing to use combinations of fields and keep on looking for a unique combination. Like for customer, use Firstname, Lastname, Birthdate and place of birth for example (could theoriticaly still be not unique, but the changes are getting smaller). Somehow I don't have a good feeling about this, as each soap service/client needs to provide all this info correctly. Seems like a lot of redundancy.

    The most solid solution to me still seems a surrogate identifier, but you need to appoint one owner service and database of the entity. If it is a product of your company this seems feasible, however the id must be publicly known and preferably not too complex (no guid). In the case of customers, it is very unlikely that your company is the owner of the entity. Using some national identification number of persons or companies is likely to become a mess with international customers. So we would need an international owning service / database of customers. However, I think this is hard to organize.

    What I think happens now is that as a customer you need to register with a company if you want to buy something online, and you get an identifier within that company. To make it possible for two webservices of different companies to communicate about a customer, they would need a conversion table / service, converting the customer id within the first company to the customer id in the second company.

    Still not an easy straightforward solution, but I hope this contributes to possible solutions

    Wednesday, December 27, 2006 10:21 AM