none
Persistence Ignorance in the Entity Framework

    Question

  • The ADO Entity Framework seems to take us a certain part of the way, but not all the way, to what Fowler calls "Persistance Ignorance". That is, ideally, one would like to be able to establish a domain layer that contained entities that are in no way aware of the underlying persistence mechanism being used. This would allow one to easily replace the actual data access layer without being concerned about this affecting the domain layer.

    With the ADO.Net Entity Framewok I see two main violations of this principal that, as far as I can tell, cannot be circumvented:

    1. Entities must subclass the Entity framework class, and

    2. It is necessary to attach various attributes to entity properties.

    I can live with (1); with the appropriate discipline, one can avoid any functional dependence on the parent class from within the domain layer. If later I decide to use another persistence mechanism than ADO, I can easily remove Entity from the inheritance tree.

    Item (2), on the other hand, is much more troublesome and I can't see why attaching these attributes is strictly necessary. Notwithstanding the Data Contract attributes, all of the information encapsulated by these attached attributes is derived from information contained in the CSDL file - which is available at runtime via the metadata workspace. (By the way, are these data contract attributes required?)

    These considerations lead to the following questions:

    1. Why is it necessary to redundant metadata (information that is both in the CSDL and attached as attributes); why can't the single source of this information just be the CSDL specification?

    2. Was the concept of "persistence ignorance" considered when the framwork was being designed? If so, why weren't the tenets of this principal followed more closely?

    3. Am I missing something? Is there some construct that would allow complete (or even allow one to come closer to) PI? Is there any way I could create a domain class that has no dependency at all on the entity framework and yet still have persistence services be managed by the framwork provided the CSDL, MSL and SSDL are specified?

    I think what you guys are doing here is really cool, interesting and useful; I just wish your approach allowed complete independence of one's domain layer from persistence infrastructure

    Tuesday, March 13, 2007 11:49 AM

Answers

  • Well, this is certainly a big topic, and let me reassure you that it is something we have thought a lot about and are designing for in the long-term.  Given the timeframes involved and the scope of the entity framework, one big challenge has been to put together a long-term design that will get us where we want to go and then figure out how to deliver something of value in our first release that we can build on to get toward the final vision over the course of more than one release.  I think we're on track for that.  This is less than ideal, but it's the reality that we live in.  (Or at least that I live in.  Maybe there are others out there who have a less constrained world--good for them!)

    To address the two specific concerns:

    1) The requirement that entity classes inherit from a single base class:

    This is a restriction we are actively working on removing, and in a future CTP you will be able to create your own data classes which do not inherit from one of our framework classes but instead just implement a set of interfaces that define the minimal functionality required for successful interaction with the framework (basically management of a key, change tracking on properties and instantiation of a "RelationshipManager" class supplied by the framework if you want relationships).  Even this won't be as fully extensible in the first relase as I'd like it to be long-term, but it should go a long ways toward allowing folks to add entity framework support to already existing object hierarchies, to have more flexibility in the way objects appears (for instance to persist private, protected or internal properties not just the public properties which the code generator always produces), etc.

    2) The requirement to attach attributes to properties on your entity objects:

    The issues you bring up here are much more serious because you are concerned that these attributes "tie" you to a specific back-end store model / seem to force understanding of your back end persistence format.  Let me assure you that the idea is that they should not have that affect.  The long-term idea is to have two separate opportunities for abstraction: First, there is the mapping provided by EntityClient which separates your conceptual model from the store format.  Already it is true that you can write three separate map files that target the same conceptual schema on one end and different backend schemas and/or providers on the other end and have everything continue to work the exact same way at the object layer.  We don't yet have providers that support XML or access, but the basic concept is there and these can be added in the future--the only thing that would change is your connection string (which of course can be managed with config files, etc.)

    The second abstraction opportunity is between the data classes (your objects) and the conceptual schema.  For Orcas we have scoped the problem down and require that the mapping at htis layer be 1-to-1, and long-term we will always have a path that makes things especially efficient in cases where it is 1-to-1, but there are cases where the conceptual schema may be centrally defined or shared across mulitple applications and we need the opportunity to introduce re-naming or hiding at this layer providing further separation from the backend storage.  This object-to-conceptual mapping layer is built into the architecture, but we'll have to wait for a future release before enabling the additional flexibility of specifying non-default mappings.

    As far as the question of why there is seemingly redundant information in the attributes, the key point here is that these attributes provide the linkage between the conceptual model and the object attributes--they simply identify those classes which should be persistable and the properties on them which should be persisted.  The exact format of the attributes in the current CTP leaves a little to be desired, and we're working hard to simplify/clarify them in a future CTP as part of the effort I mention above for allowing custom classes not just ones that inherit from our entity base class.  Once we introduce non-default mappings, then this approach allows data objects to operate completely independently right up until the time you need to bind the object to a specific storage target (add/attach to a context) and then the properties become bound first to a conceptual schema so that we can at least track property changes, and then if you actually want to query from or persist to a backend store, that conceptual schema is bound through a mapping to the store schema.  The same objects with the same properties could in theory be used with multiple conceptual models where the exact conceptual model is determined at runtime (once we add non default object/conceptual mapping), and then the same conceptual model can also be used with multiple different back end stores--wtih the exact one also being determine at runtime.  The thought is that this should provide substantial persistence ignorance.

    It is true that we don't allow complete persistence ignorance in that we don't support true, plain-old-clr objects with absolutely no knowledge of persistence--you have to expose to the framework which properties and classes actually need persisted, and if you have relationships between entities, then you have to expose information about the nature of those relationships so that automatic fixup and things can occur properly, but you don't have to tie the objects to an exact backend store, and long-term you won't even have to tie them to a particular conceptual model.

    Another question you asked is whether or not the DataContract attributes are necessary, and the answer is that they certainly are not.  Our code generation layer supplies them just to make serialization with WCF easier/more automatic with the generated classes.  In a future CTP code generation will also supply [Serializable] attributes in order to make binary serialization similarly easy, but the entity framework doesn't use or depend on these in any way.  If you author your own classes you can leave them out.

    - Danny

    P.S. This is important stuff!  I may not yet be articulating these things very clearly, so please ask more questions as necessary for us to get to the bottom of this.  Your feedback is very much appreciated.

    Tuesday, March 13, 2007 2:57 PM

All replies

  • Getting us "partly there" seems to be a big theme in the Entity Framework. Virtually all of the features that I like about the EF are about 75% of what I want them to be. One such feature is Persistence Ignorance. In a true entity mdoel, I should, in theory, be able to take the model of my entities, complete with ability to create a runtime graph of that model - and go and load data from 3 different stores (XML, Access, or SQL Server) and not change a single line of code. Sure, I should have to create a new map each time, but that map file should be outside the code. In other words, the entities should not have any information on or around them that "gives away" the fact that they are being mapped to a specific persistence medium. They should be simple, dumb (well, slightly dumb) entities - pure model objects and nothing more.

    If the CSDL weren't so ugly to look at, I would suggest stripping the redundant attributes and placing more (if not all) of the relevant information in the trio of XML files that make up an EF model.

    Tuesday, March 13, 2007 12:59 PM
  • Well, this is certainly a big topic, and let me reassure you that it is something we have thought a lot about and are designing for in the long-term.  Given the timeframes involved and the scope of the entity framework, one big challenge has been to put together a long-term design that will get us where we want to go and then figure out how to deliver something of value in our first release that we can build on to get toward the final vision over the course of more than one release.  I think we're on track for that.  This is less than ideal, but it's the reality that we live in.  (Or at least that I live in.  Maybe there are others out there who have a less constrained world--good for them!)

    To address the two specific concerns:

    1) The requirement that entity classes inherit from a single base class:

    This is a restriction we are actively working on removing, and in a future CTP you will be able to create your own data classes which do not inherit from one of our framework classes but instead just implement a set of interfaces that define the minimal functionality required for successful interaction with the framework (basically management of a key, change tracking on properties and instantiation of a "RelationshipManager" class supplied by the framework if you want relationships).  Even this won't be as fully extensible in the first relase as I'd like it to be long-term, but it should go a long ways toward allowing folks to add entity framework support to already existing object hierarchies, to have more flexibility in the way objects appears (for instance to persist private, protected or internal properties not just the public properties which the code generator always produces), etc.

    2) The requirement to attach attributes to properties on your entity objects:

    The issues you bring up here are much more serious because you are concerned that these attributes "tie" you to a specific back-end store model / seem to force understanding of your back end persistence format.  Let me assure you that the idea is that they should not have that affect.  The long-term idea is to have two separate opportunities for abstraction: First, there is the mapping provided by EntityClient which separates your conceptual model from the store format.  Already it is true that you can write three separate map files that target the same conceptual schema on one end and different backend schemas and/or providers on the other end and have everything continue to work the exact same way at the object layer.  We don't yet have providers that support XML or access, but the basic concept is there and these can be added in the future--the only thing that would change is your connection string (which of course can be managed with config files, etc.)

    The second abstraction opportunity is between the data classes (your objects) and the conceptual schema.  For Orcas we have scoped the problem down and require that the mapping at htis layer be 1-to-1, and long-term we will always have a path that makes things especially efficient in cases where it is 1-to-1, but there are cases where the conceptual schema may be centrally defined or shared across mulitple applications and we need the opportunity to introduce re-naming or hiding at this layer providing further separation from the backend storage.  This object-to-conceptual mapping layer is built into the architecture, but we'll have to wait for a future release before enabling the additional flexibility of specifying non-default mappings.

    As far as the question of why there is seemingly redundant information in the attributes, the key point here is that these attributes provide the linkage between the conceptual model and the object attributes--they simply identify those classes which should be persistable and the properties on them which should be persisted.  The exact format of the attributes in the current CTP leaves a little to be desired, and we're working hard to simplify/clarify them in a future CTP as part of the effort I mention above for allowing custom classes not just ones that inherit from our entity base class.  Once we introduce non-default mappings, then this approach allows data objects to operate completely independently right up until the time you need to bind the object to a specific storage target (add/attach to a context) and then the properties become bound first to a conceptual schema so that we can at least track property changes, and then if you actually want to query from or persist to a backend store, that conceptual schema is bound through a mapping to the store schema.  The same objects with the same properties could in theory be used with multiple conceptual models where the exact conceptual model is determined at runtime (once we add non default object/conceptual mapping), and then the same conceptual model can also be used with multiple different back end stores--wtih the exact one also being determine at runtime.  The thought is that this should provide substantial persistence ignorance.

    It is true that we don't allow complete persistence ignorance in that we don't support true, plain-old-clr objects with absolutely no knowledge of persistence--you have to expose to the framework which properties and classes actually need persisted, and if you have relationships between entities, then you have to expose information about the nature of those relationships so that automatic fixup and things can occur properly, but you don't have to tie the objects to an exact backend store, and long-term you won't even have to tie them to a particular conceptual model.

    Another question you asked is whether or not the DataContract attributes are necessary, and the answer is that they certainly are not.  Our code generation layer supplies them just to make serialization with WCF easier/more automatic with the generated classes.  In a future CTP code generation will also supply [Serializable] attributes in order to make binary serialization similarly easy, but the entity framework doesn't use or depend on these in any way.  If you author your own classes you can leave them out.

    - Danny

    P.S. This is important stuff!  I may not yet be articulating these things very clearly, so please ask more questions as necessary for us to get to the bottom of this.  Your feedback is very much appreciated.

    Tuesday, March 13, 2007 2:57 PM
  • Danny,

    Thanks for your detailed response; I'm really glad to hear that you will be eliminating the need for domain objects to subclass the Entity class. Given this, I'm still not quite sure why the journey to complete PI cannot be made.

    For example, consider the Northwind customer table. The corresponding entity code that gets generated for this table includes a property named "CustomerId" to which attributes of type EntityKeyProperty and Nullable(false) are attached. The generator knows that the CustomerId property cannot be null and represents the entity's identity based on information found int the CSDL file; no information is gained by attaching the attributes and their presence is redundant with the information contained in the conceptual model which is accesible via the metadata workspace at runtime.

    Consider, without requirements to inherit from a base class (which you are implementing) and attach various attributes to properties, one could create a completely persistance-agnostic domain model that would be compatible with the entity framework. As another poster suggested, this clean separation would allow domain entities to be persisted as XML, in an object database or any other medium without the domain objects themselves knowing or caring where they end up.

    Please help me understand, then, why the attributes are required to be attached.

    Thanks,

    Chris

    Tuesday, March 13, 2007 5:39 PM
  • It's my intention to try to write up a fairly detailed response because this is such an important topic.  Unfortunately I'm driving hard toward a deadline right now so this is going to have to be delayed a few days.  I just wanted to let you know that I haven't forgotten, and I will respond.

    - Danny

    Thursday, March 15, 2007 4:46 AM
  • Danny,

    I look forward to your response.

     

    Thanks,

    Chris.

    Thursday, March 22, 2007 3:12 AM
  • I'm sorry for the delay in writing a response, and I hate to be the bearer of bad news, but it looks like I'm going to have to keep you waiting a bit longer.  Check out the post I made to my blog earlier this evening.

     

    The essence is this: I believe this topic deserves the best treatment I can give it, and I'm not yet ready to do a good job.  I'm going to work on it, though, and keep folks apprised as best I can.

     

    - Danny

     

    Wednesday, March 28, 2007 3:51 AM