none
to account for database schema changes RRS feed

  • Question

  • I need to deal with frequent database schema changes

    I tried "open schema" [1] that invoked many questions and doubts
    (it seems that this approach gives more troubles than gains... having all types of data like phones, dates, names, etc. in one column).
    Can anybody share with experience, articles, projects on it?


    What are possible alternatives to this approach?


    [1, p.331]
    Itzik Ben-Gan and Lubor Kollar
    Inside Microsoft SQl Server 2005: T-SQL Quering
    http://www.sql.co.il/books/insidetsql2005/Inside%20Microsoft%20SQL%20Server%202005%20T-SQL%20Querying%20(0-7356-2313-9)%20-%20Chapter%2006%20-%20Aggregating%20and%20Pivoting%20Data.pdf
    " Open schema is a schema design you create to deal with frequent schema changes."
    "In a scenario with frequent schema changes, you can store all data in a single table, where each attribute value resides in its own row along with the entity or object ID and the attribute name or ID. You represent the attribute values using the datatype SQL_VARIANT to accommodate  multiple attribute types in a single column."

    Thursday, June 14, 2007 7:38 AM

Answers

  • For most projects you are far better off to define a schema and write your app accordingly.  Even with few developers this will make much more sense. 

     

    Trying to write something with an Open Schema approach will be much more difficult to maintain in the long run.

    Saturday, June 16, 2007 12:45 AM
  • In some regards open schema is a very good pattern especially with applications that are very data-centric and have frequently changing needs.

    Done correctly, you will have a declarative object model (using reflection to pivot the attribute value rows into actual properties on your objects). And with a declarative UI model (such as WPF) and declarative workflow model (such as windows workflow), all your client application will have to do is provide a host that can load its UI from the database.

    Tuesday, June 19, 2007 3:54 PM
  • XML is an option but its limitations must be considered.

     

    The fact that XML is (by definition) case-sensitive is highly important if you're considering searching through XML data. As long as not all XPath/XQuery functions have been implemented in the SQL Server database engine, there is no sensible way of circumventing this characteristic. One option would be to store a lower-case (or upper-case) copy of the XML in the database for these purposes, but that will inevitably increase storage size dramatically. IMHO it would be much more efficient to focus on representing a core of data relationally (properly normalised) and use XML for the "open" part of the data model.

     

    Another option would be to narrow down the search through XML data by utilizing full-text indexing. Of course, FTS comes with its own performance overhead (well, at least it's pretty stable).

     

    The exist XML method is very efficient (compared to other methods), but it suffers from case sensitivity. I'm just finishing a new example that I intend to post on my blog (http://milambda.blogspot.com) in a couple of days.

     

    All in all XML is a viable option but the fact that searching through it is very resource-intensive may just as well be the decisive point against its use. The question you need to answer is whether you can live with an open schema that (currently) performs bad. But does it perform worse than relational EAV?

     

    Test, test, test... The only path to the truth.

     

     

    ML

    Wednesday, June 20, 2007 7:59 AM
  • You always can use combination of the practices.

    For example one of my systems was designed to have data types stored within database in regular tables, in most cases one-to-one mapping from fields to columns. However every table was designed to have child table named "Properties". This is one of the cases of the “open schema” approach.

    Data from this “properties” table was able to data type only privately (it filled automatically). This data was exposed to the rest of the application as public typed properties.

    Minor versions we released without changes in database. We only added such “properties”. For major versions this properties moved to the regular columns, and data type is remade to use fields instead of the request data from this private “properties” collection.

    Of course this solution has its own conses.

    One of the biggest one is that if your database id “public”. For example if you use reports and they use data from this database directly or another applications have direct access to the database. Solution for this problem can be specially built view.

    Another one is upgrade scripts, because of untyped nature of the “open schema” approach, possible situations with wrong data.

    And eventually - performance. “Open schema” approach can lead to the performance problems. Because this solution is based on this approach, this problem is also actual for it.

    Few notes on XML solution.

    This solution only from first point seems to appear as good solution. When you start to implement it, some problems discloses.

    First of all, this not a little deal to support previous versions of the XML previously stored. For example, when you use regular approach it much more easily to create upgrade scripts. This is not true for XML. You will need to keep “history” of convertors from one version. If you use “lax” approach, it can lead to unlimited number of the possible variants of the XML file to parse, this very bug prone. If you use “strict” approach this on other case can lead into the not necessary complexity of the application, which will need to be able to track all (even small) changes.

    Other problems are the same as for solution that proposed at the start. I mean publicity and performance. In some cases this problems are even worse.

    Wednesday, June 20, 2007 10:08 AM
  • I mentioned I was preparing another article on the subject of XML in SQL Server 2005. This one deals with the exist XML Method.

     

    The address: http://milambda.blogspot.com/2007/07/sql-server-2005-xml-methods-part-four.html

     

    Friday, July 6, 2007 1:33 PM
  • Given the fact that your schema changes often, it doesn't appear that you have to deal with the constraint of other applications which are working with your DB. That's good. If so, hve you considered the option of generating your schema from your business objects? NHibernate supports this quite well, as do many other O/R mappers and DAL generators.

     

    From an architectural perspective, decide on where will be the one true definition of your application schema and try to generate everything else from that. Of course, this will only hold if you don't have a lot of custom logic that needs to be written that relates to every specific field/column.

     

    Does that help?

    Saturday, July 7, 2007 11:00 AM
  • It seems to me that "open schema" is a misnomer for "no schema".  By using open schema, you lose key advantages that a DBMS offers, such as relational integrity, strongly typed data, and efficient indexing.  If your data model involves more than a very few entities (i.e., if your data model is non-trivial) then open schema is a bad choice.

    There are application design choices that make your app more resilient to schema change.  For example (and excuse me if I'm stating the obvious) your app should never include "select *" - sql should always specify the columns it needs.  A data access layer that maps application data fields to database columns can protect your app against changes to column or table names. Views can also be helpful in this regard.

    Leo
    Tuesday, July 17, 2007 4:13 AM
  • "It is also my believe that DBMS is for storing data, it is in service of applications and biz needs and not vice versa!"

     

    It is true that a DBMS is for storing data, but that does not equate to diminishing the importance of proper data modeling. You presumably are not storing your data in flat files because you want the performance advantages of a relational database management system. Well, part of the cost of entering into that technology is a well thought out relational schema that balances data isolation, maintainability, understanding and performance considerations. It is my belief that between business needs and technical implementation lies a data model which serves the purposes of the business for that moment in time. Additionally, it is my experience that businesses that cannot be bound to specifying their data model tend to spend all their time talking about the virtues of their ideas and no time talking about how to make business decisions based on the well defined data in which they have collected.

     

    I have had (forced upon me) experience of the property/value table that ballooned to over 80 million records and it was the bane of our existence for performance and coding solutions which the business thought should be easy. It is regarded silly to not be able to dynamically add attributes, have stellar performance and drag and drop reports based on all data from everywhere all while supporting tens of millions of records. This is where people skills come in handy! Smile

    Tuesday, July 17, 2007 4:28 PM
  • I am a big fan of Code generation and OR mapping, I think it would be ideal in your situation. While not eliminating the need to recode but it will reduce it greatly. I would use a template based generator like with Codesmith templates, this will give you a lot of flexibility and save you a lot of time.
    Tuesday, July 17, 2007 5:32 PM

All replies

  • Depends on how frequent you expect the changes to be. Is it just that there is a lot of churn right now but the schema will stablize later?  Does the schema vary by row?  Or is it that the schema will change every quarter?

    Or is it truly that there is no long term structure to the data?

     

    I would avoid open schema or similar approaches unless there were no other option.

     

    A similar approach would be to store the data in an XML column with SQL Server 2005.

     

    Thursday, June 14, 2007 6:00 PM
  • We have just 1.5 (one-and-half) developers for a lot of new projects and some old databases migrated from MS Access currently under MS Access code/UIs to be migrated to .NET winforms.

    And this is some attempt to unify approach under some "standardized" or generalized winforms

     

    And you know customers - they never know what they want now and next...

     

    And where can I look for references on XML column approach?

    I have read this phrase in the same [1, p.333]:
    "There are other approaches to deal with frequent data definition changes—for example, storing

    the data in XML format."
    but I got the feel that it is even worse way?

     

     

    Friday, June 15, 2007 11:55 AM
  • For most projects you are far better off to define a schema and write your app accordingly.  Even with few developers this will make much more sense. 

     

    Trying to write something with an Open Schema approach will be much more difficult to maintain in the long run.

    Saturday, June 16, 2007 12:45 AM
  • Ron,

    thanks.

    This is also my opinion but it does not save my time in elaborating arguments against open schema approach and in finding some another.

     

    I hoped that this question would follow more animated discussion and/or feedback.

    I shall try another more specific forum

     

     

    Sunday, June 17, 2007 5:33 AM
  • In some regards open schema is a very good pattern especially with applications that are very data-centric and have frequently changing needs.

    Done correctly, you will have a declarative object model (using reflection to pivot the attribute value rows into actual properties on your objects). And with a declarative UI model (such as WPF) and declarative workflow model (such as windows workflow), all your client application will have to do is provide a host that can load its UI from the database.

    Tuesday, June 19, 2007 3:54 PM
  •  Ron Jacobs wrote:

    Or is it truly that there is no long term structure to the data?

     

    Yes, it seems the nature of any agile business (it should try this and that, exploit opportunities, to be flexible and fast). We have already "structures" and need to dynamize them.

     

    Anв the changes are not necessarily unknown beforehand. We have situation that:

    - we should leave space to expected changes;

    - already existing structures have exponential growth in factors, types to be accounted and that are difficult to generalize;
     
    And my employer is the consumer of developed products, so, it is difficult to impose change management on him, i.e. to freeze requirements or to impose structures on him or  to search clients who would like them to be harnessed.

     

     Ron Jacobs wrote:

    A similar approach would be to store the data in an XML column with SQL Server 2005.

     

     

    XML is not an option because it seems to be 2-3 orders slower in comparison to classic (aka? conventional, relational) model.

    http://milambda.blogspot.com/2007/01/sql-server-2005-xml-methods-part-two.html

    Really I first tried it myself and then looked if anybody uses it and how with such performance bottleneck.

     


    Then, the last but not least is the absence of some practical suppoert - examples, samples, articles, etc. It is also perplexing me

     

    _Mike_Brown's suggestions are very close to my conclusions. I have experience with WWF but not yet with WPF 

     

    I unmarked "Answered" hoping for more discussion in  this thread

    Wednesday, June 20, 2007 2:17 AM
  • XML is an option but its limitations must be considered.

     

    The fact that XML is (by definition) case-sensitive is highly important if you're considering searching through XML data. As long as not all XPath/XQuery functions have been implemented in the SQL Server database engine, there is no sensible way of circumventing this characteristic. One option would be to store a lower-case (or upper-case) copy of the XML in the database for these purposes, but that will inevitably increase storage size dramatically. IMHO it would be much more efficient to focus on representing a core of data relationally (properly normalised) and use XML for the "open" part of the data model.

     

    Another option would be to narrow down the search through XML data by utilizing full-text indexing. Of course, FTS comes with its own performance overhead (well, at least it's pretty stable).

     

    The exist XML method is very efficient (compared to other methods), but it suffers from case sensitivity. I'm just finishing a new example that I intend to post on my blog (http://milambda.blogspot.com) in a couple of days.

     

    All in all XML is a viable option but the fact that searching through it is very resource-intensive may just as well be the decisive point against its use. The question you need to answer is whether you can live with an open schema that (currently) performs bad. But does it perform worse than relational EAV?

     

    Test, test, test... The only path to the truth.

     

     

    ML

    Wednesday, June 20, 2007 7:59 AM
  • You always can use combination of the practices.

    For example one of my systems was designed to have data types stored within database in regular tables, in most cases one-to-one mapping from fields to columns. However every table was designed to have child table named "Properties". This is one of the cases of the “open schema” approach.

    Data from this “properties” table was able to data type only privately (it filled automatically). This data was exposed to the rest of the application as public typed properties.

    Minor versions we released without changes in database. We only added such “properties”. For major versions this properties moved to the regular columns, and data type is remade to use fields instead of the request data from this private “properties” collection.

    Of course this solution has its own conses.

    One of the biggest one is that if your database id “public”. For example if you use reports and they use data from this database directly or another applications have direct access to the database. Solution for this problem can be specially built view.

    Another one is upgrade scripts, because of untyped nature of the “open schema” approach, possible situations with wrong data.

    And eventually - performance. “Open schema” approach can lead to the performance problems. Because this solution is based on this approach, this problem is also actual for it.

    Few notes on XML solution.

    This solution only from first point seems to appear as good solution. When you start to implement it, some problems discloses.

    First of all, this not a little deal to support previous versions of the XML previously stored. For example, when you use regular approach it much more easily to create upgrade scripts. This is not true for XML. You will need to keep “history” of convertors from one version. If you use “lax” approach, it can lead to unlimited number of the possible variants of the XML file to parse, this very bug prone. If you use “strict” approach this on other case can lead into the not necessary complexity of the application, which will need to be able to track all (even small) changes.

    Other problems are the same as for solution that proposed at the start. I mean publicity and performance. In some cases this problems are even worse.

    Wednesday, June 20, 2007 10:08 AM
  • Thanks Mike Chaliy,

    these conclusions come if to overview internet on the issue

    I found that "Open schema" is not common term.

    I would save me couple of days if anybody mentioned from the start the more used terminology:
    - EAV (Entity-Attribute-Value)
    - EAV/CR
    - Key Value pairs (tables) in database design
    - Modeling Columns as Rows
    - OTLT (One True Lookup Table)
    searches on which give much more substantial forum discussions, projects, codes, weblogs, articles.

    They also hint that there are more proper places with more animated feedback for such discussions.

     

    Matija Lah,
    As I could find you investigation is really unique in the area.

    This is really surprising to me, i.e. the conclusion is that nobodys seems to use it.


    I was really disappointed when I failed to find your continuation you mentioned.
    I look forward for it!

     

    Really I tried immediately ""XML data by utilizing full-text indexing"". I do not have time for accurate comparison but the feeling was that full-text indexing just make things worse! i.e. slower
    Though it was just feeling:
    3 lines of data (vs. your dozens а thousands) and delay to up to 5-20 sec. I even did not come to see exact figures - even if I erred up to 100 times, it is still not acceptable.

     

    I have comparable 64 000 lines EAv table (with sql_variant) and it is 2-3 orders faster than XML while XML is 3-4 orders slower than relational model data.

     

     

    Monday, June 25, 2007 10:52 AM
  • I mentioned I was preparing another article on the subject of XML in SQL Server 2005. This one deals with the exist XML Method.

     

    The address: http://milambda.blogspot.com/2007/07/sql-server-2005-xml-methods-part-four.html

     

    Friday, July 6, 2007 1:33 PM
  • Given the fact that your schema changes often, it doesn't appear that you have to deal with the constraint of other applications which are working with your DB. That's good. If so, hve you considered the option of generating your schema from your business objects? NHibernate supports this quite well, as do many other O/R mappers and DAL generators.

     

    From an architectural perspective, decide on where will be the one true definition of your application schema and try to generate everything else from that. Of course, this will only hold if you don't have a lot of custom logic that needs to be written that relates to every specific field/column.

     

    Does that help?

    Saturday, July 7, 2007 11:00 AM
  • Matija,
    thanks a lot.
    You do great and unique in this field  work!
     
    Udi Dahan,
    thanks a lot. It is also my believe that DBMS is for storing data, it is in service of applications and biz needs and not vice versa!
    Unfortunately in the life it is frequently vice versa
     
    Yes, this helps. As a matter of fact, it was my first reaction to imposed on me decisions. 
     
    I tried to mark all answers as helpful.
    I am not sure if such question has the answer in order to mark this topic as answered
    Monday, July 16, 2007 3:48 PM
  • It seems to me that "open schema" is a misnomer for "no schema".  By using open schema, you lose key advantages that a DBMS offers, such as relational integrity, strongly typed data, and efficient indexing.  If your data model involves more than a very few entities (i.e., if your data model is non-trivial) then open schema is a bad choice.

    There are application design choices that make your app more resilient to schema change.  For example (and excuse me if I'm stating the obvious) your app should never include "select *" - sql should always specify the columns it needs.  A data access layer that maps application data fields to database columns can protect your app against changes to column or table names. Views can also be helpful in this regard.

    Leo
    Tuesday, July 17, 2007 4:13 AM
  • "It is also my believe that DBMS is for storing data, it is in service of applications and biz needs and not vice versa!"

     

    It is true that a DBMS is for storing data, but that does not equate to diminishing the importance of proper data modeling. You presumably are not storing your data in flat files because you want the performance advantages of a relational database management system. Well, part of the cost of entering into that technology is a well thought out relational schema that balances data isolation, maintainability, understanding and performance considerations. It is my belief that between business needs and technical implementation lies a data model which serves the purposes of the business for that moment in time. Additionally, it is my experience that businesses that cannot be bound to specifying their data model tend to spend all their time talking about the virtues of their ideas and no time talking about how to make business decisions based on the well defined data in which they have collected.

     

    I have had (forced upon me) experience of the property/value table that ballooned to over 80 million records and it was the bane of our existence for performance and coding solutions which the business thought should be easy. It is regarded silly to not be able to dynamically add attributes, have stellar performance and drag and drop reports based on all data from everywhere all while supporting tens of millions of records. This is where people skills come in handy! Smile

    Tuesday, July 17, 2007 4:28 PM
  • I am a big fan of Code generation and OR mapping, I think it would be ideal in your situation. While not eliminating the need to recode but it will reduce it greatly. I would use a template based generator like with Codesmith templates, this will give you a lot of flexibility and save you a lot of time.
    Tuesday, July 17, 2007 5:32 PM
  • Moved down since it was inserted out of order
    Thursday, July 19, 2007 3:48 AM
  • Something is weird with this thread.

    My answers appear before other posts made 2 days ago

     

    Then I intermittently see 1 or 2 pages for the thread.

     

    Anybody else also has such hallucinations?

     

    Anyway look for my post today that appeared somewhere inside the posts of before yesterday!

    Thursday, July 19, 2007 4:54 AM
  • I am contemplating on writing an article comparing the lines of manual code needed
    1)to make a simple application,
    2)then to introduce database schema changes
    with EAV/CR and relational  database.
     
    To fill these 2 databases under comparison with data from some sample SQL Server database
    And to compare performance
     
    Might be in order to save me the time I should base such work on some code from already existing demo/article?
     
    I am still thinking whether to stick with webforms or winforms
     
    It should be simple enough but not simpler in order to be easily read. Well, I also do not have much time 
    Is it interesting? I am in doubt because EAV/CR seems to me of some marginal interest of very small part of developers
     
    Any ideas or wishes are welcome!
    Friday, July 20, 2007 10:25 AM