locked
Patterns for persisting Object Models to Databases and back again RRS feed

  • Question

  • I am designing an archtecture for importing/validating/disaplying/storing/retrieving large amounts of survey data.

     

    The general flow is simple enough:

     

    1. Get data from varying sources/formats
    2. Convert to a common format
    3. Validate it according to some rules
    4. Store in database
    5. Retrieve from database.

     

    My question revolves around the interaction of the common format (2) and storage and retrieval (4 and 5).

     

    My initial thought was to design a nice neat interface based object model for storing the questions, the respondents, their answers and all the cross referencing that you can imagine is needed to navigate in any direction.

     

    .NET, in my case specifically C# is always reference based when dealing with boxed objects, so a neatly cross referencing model should work well.

     

    Then there came the point of converting to a relational model...ok the common practice is to throw your interface based model at some adapter that accepts that interface and performs the sql inserts for you, letting SQL generate the identity columns.

     

    It's when it came to getting it back out and dealing with the assigning of typed object collections to a container object that is started to look odd.  I now had a structure based on foreign keys being converted to a container/contained pattern.  Then there was the issue of performing updates and working out what had been changed removed etc.

     

    If you are still reading, I thanks you for your time.

     

    So I am now considering using typed dataset that is based on the database schema.  This has the benifits of tracking row changes/insertions/deletions etc but I have doubts regarding it's portability an interface model can be very versatile, a typed dataset seems more rigid.

     

    I guess the end question is do I view this from a model to database point of view or database to model.

     

    I would really appreciate others' experience opinions and even flames (as long as there is a valid suggestion at the end of it).

     

    TIA

     

    MattC

    Monday, March 17, 2008 12:32 PM

Answers

  • I think you should consider using SubSonic.
    I think it addresses most of your needs, and may be tweaked to address the remanining.

    Regards
    Monday, March 17, 2008 2:01 PM
  • Matt,

     

    This is a fairly common scenario, ETL.  There are lots of approaches to this, and I do have a couple of pointers on what not to do.

     

    How are the different formats related, and therefore what is the need for the common format?  The reason I ask, is there any possibility to work in the common format from the start, and miss out the conversion, as that's the only thing that stands out as being anything of a challenge.  Presumably, also if you're getting the data from a specific format, and converting to a common format, you'd also have to convert back to the specific format?

     

    I wouldn't use Xslt for the transformation to a common Xml format, since inherently, there can be a lot of typographical errors in the xslt.  It's just too hard, and there are easier approaches.

     

    Linq might also be something to look at, from the transformation perspective, not an exact fit, but would allow you to perform some manipulation.

     

    Depending upon how the solution is to work, how big it is, and the costs involved, BizTalk about work, something similar to an ESB.  That's obviously an enterprise solution, so the application would need to be of that size, and need to scale an so on, to fit the choice of BizTalk.

     

    The issue with surrogate keys is a fairly common one.  You'd probably have to keep the key information in the objects, or at least have some way of identifying it with the database level data.

     

    A typed dataset would indeed work, but you could use DTO objects to transport the data, so I wouldn't necessarily make the choice of dataset based on the fact that it appears to do some things for you automatically.

    The offline capability of the dataset, and the diffgram functionality may be a good reason to chose the dataset though.  You have to be careful not to use a dataset when scalability could be a concern, since sending a dataset between layers isn't the most optimal way to do things, necessarily.  You have to work out what you need from the technology, and then see if a dataset would fit it, and how similar a hand-rolled solution would be to a dataset also.  If they're almost the same, then a dataset is probably a good way to go, however, if the dataset has lots of other functionality that comes with it that you can't get rid of, then that has an overhead, and you have to decide if it is acceptible.

    The other thing that is yucky, is if you're using a dataset to push data to the UI layer, then I would say that that is a very yucky solution.  The UI is then going to be closely coupled to the database schema, and you probably don't want that.  It would be better perform some conversion to the entities that the UI displays, however in doing that, you may undo some of the benefits of the typed dataset, to the point that you may find that your hand-rolled version is going to work better for you?

     

    I hope this helps to get you thinking, as I don't know your problems space intimately, so it would be down to your situation and requirements, as to which way you proceed.

     

    I hope this helps,

     

    Martin Platt.

     

     

     

    Monday, March 17, 2008 10:43 PM
  • Did you try SQL Server SSIS?  I dont know all the capabilities of SSIS, but it does offer a reasonable ETL compared to buying a specialised ETL tool like Informatica...
    Monday, March 24, 2008 9:02 PM

All replies

  • I think you should consider using SubSonic.
    I think it addresses most of your needs, and may be tweaked to address the remanining.

    Regards
    Monday, March 17, 2008 2:01 PM
  • Matt,

     

    This is a fairly common scenario, ETL.  There are lots of approaches to this, and I do have a couple of pointers on what not to do.

     

    How are the different formats related, and therefore what is the need for the common format?  The reason I ask, is there any possibility to work in the common format from the start, and miss out the conversion, as that's the only thing that stands out as being anything of a challenge.  Presumably, also if you're getting the data from a specific format, and converting to a common format, you'd also have to convert back to the specific format?

     

    I wouldn't use Xslt for the transformation to a common Xml format, since inherently, there can be a lot of typographical errors in the xslt.  It's just too hard, and there are easier approaches.

     

    Linq might also be something to look at, from the transformation perspective, not an exact fit, but would allow you to perform some manipulation.

     

    Depending upon how the solution is to work, how big it is, and the costs involved, BizTalk about work, something similar to an ESB.  That's obviously an enterprise solution, so the application would need to be of that size, and need to scale an so on, to fit the choice of BizTalk.

     

    The issue with surrogate keys is a fairly common one.  You'd probably have to keep the key information in the objects, or at least have some way of identifying it with the database level data.

     

    A typed dataset would indeed work, but you could use DTO objects to transport the data, so I wouldn't necessarily make the choice of dataset based on the fact that it appears to do some things for you automatically.

    The offline capability of the dataset, and the diffgram functionality may be a good reason to chose the dataset though.  You have to be careful not to use a dataset when scalability could be a concern, since sending a dataset between layers isn't the most optimal way to do things, necessarily.  You have to work out what you need from the technology, and then see if a dataset would fit it, and how similar a hand-rolled solution would be to a dataset also.  If they're almost the same, then a dataset is probably a good way to go, however, if the dataset has lots of other functionality that comes with it that you can't get rid of, then that has an overhead, and you have to decide if it is acceptible.

    The other thing that is yucky, is if you're using a dataset to push data to the UI layer, then I would say that that is a very yucky solution.  The UI is then going to be closely coupled to the database schema, and you probably don't want that.  It would be better perform some conversion to the entities that the UI displays, however in doing that, you may undo some of the benefits of the typed dataset, to the point that you may find that your hand-rolled version is going to work better for you?

     

    I hope this helps to get you thinking, as I don't know your problems space intimately, so it would be down to your situation and requirements, as to which way you proceed.

     

    I hope this helps,

     

    Martin Platt.

     

     

     

    Monday, March 17, 2008 10:43 PM
  • Martin,

     

    Thanks for your reply.

     

    Just to clarify a few things, due to my bad description.  Once data is converted into the common format( OM or DataSet) and then inserted into the database it will only ever need to be moved back to the common format, never to the original format.

     

    I don't know at this stage if it will be need to be exported from the database back into the common format for sure but it is a good bet.

     

    I agree on the fact that the temptation to send DataSets between layers is great, (given that every MS example does that).

     

    At present, although not set, I am considering havnig the common format a one way thing where by the differnt formats are converted to it as a way of abstracting away the database. 

     

    If work is needed to be done on the data in the database then possibly a seperate structure better geared for database manipulation could be used.

     

    Sorta like OM going in to reduce coupling between the import process and then a typed dataset for editing of database data (the typed dataset and DB schema can be tighly coupled)

     

    A diagram would really help here.  Can I insert a visio diagram??

     

    TIA

     

    MattC

    Wednesday, March 19, 2008 11:10 AM
  • Did you try SQL Server SSIS?  I dont know all the capabilities of SSIS, but it does offer a reasonable ETL compared to buying a specialised ETL tool like Informatica...
    Monday, March 24, 2008 9:02 PM