Architecting aggregation of data RRS feed

  • Question

  • We have a group of very similar sql server databases running on our server. We are plannning to create an application which will need to get information from all these similar databases.

    What do you all think is the best way to go about doing it?

    I am thinking about  creating a realtime or periodically updated central database aggregating the info from all the others. How can I do this?DTS? Or create a data cache in memory?

    A web application, web service and windows forms will be running off this database.

    Should be able to do a fast search. Is caching the database the only way to achieve this?

    How about XML databases?

    thanks a lot in advance.




    Tuesday, September 12, 2006 7:02 PM

All replies

  • If I understand you correctly you are looking for creating an ODS - Operational Data Store. The correct way to go about aggregating the data depends on the overall system architecture. if you have an SOA or other proper separation of the databases than it is probably better to go through the owning services (esp. in an SOA) otherwise SSIS can do a good job

    You can read a little about ODS in a SOA pattern I published sometimes ago called "Aggregated Reporting"



    Tuesday, September 12, 2006 8:16 PM
  • Thanks for your suggestions and for giving some direction Arnon. I will explore this further.
    Tuesday, September 12, 2006 9:15 PM
  • forthangol,

       Just want to recommend two articles, both from the last MS Architecture Journal (devoted to data access strategies)

    Saturday, September 16, 2006 4:30 AM
  • The best aggregation approach will depend on the data you are aggregting and what your latency requirements are.  If it's OK for the aggregate to be behand the source databases then using SSIS jobs to update the aggregate is efficient and pretty flexible.  If you want the aggregate to be close to up to date all the time then something like replication, triggers with Service Broker messages, or BizTalk is more appropriate.  If each source system is the authoritative source for a subset of the data then replication is probably a good approach.  For example, if each source system covers an issolated subset of customers or a unique set of inventory items then just replicating to the same table in the aggregate database will aggregate the data.  If more than one of the sources contains the same data - the same customers on two systems for example - then your aggregation process will need to have enough intelligence to handle conflicting updates from different source systems.
    Sunday, September 17, 2006 11:18 PM
  • Thanks you all for your guidance. For our situation, I am currently looking at SSIS to understand it better before considering other alternatives. The reason being, direct aggregating ( new application/process) at the database level looks more appealing to me compared to options requiring either modifying the  applications/services around the current systems to accomodate other types of aggregation. I will post back once I have decided if this SSIS thing really works for us or not. thanks.

    Tuesday, September 19, 2006 10:18 PM