16 februarie 2012 02:33
I've started a new project and one of the ideals we are working towards is that the database should return the same results for the same date. For example, if I run a staff list report and specify a date of, say, 1-Jan-2009 then it should returns the same results as when the report was run on that day. I've achieved this by eliminated anything such as flags that specify a staff member is 'Deleted' or 'Inactive' etc. Instead I have a table that specifies start and end dates (it's a separate table so they can have multiple start and end dates if they return). Note that this is not 100% strictly the same result as someone could run the report on 1-Jan-2009 and then they could add a staff member at a later time who supposedly started before 1-Jan-2009.
I call this a "timeless" database but I presume that is not the correct term. My question is, does this sort of thing have a name and is there any established methods surrounding this functionality?
Thanks in advance,
16 februarie 2012 13:35Membru care oferă răspunsuri
>I call this a "timeless" database but I presume that is not the correct term
Heh. Making the mistake of calling something based on it's presumed purpose than by what it is. This is a timfull database! It has many more dates and times than a regular database. :)
As a rule, i only get rid of data when it is not relevant. If keeping history is relevant, it is simply part of the data. Hence the term is database. Perhaps, in the vernacular, "database with historical data." The term snapshot may come up, but not to describe the database itself, just the results of queries.
16 februarie 2012 21:15
Hi Brian, thanks for the reply. I completely agree that history is part of the data. If an employee left the company 5 years ago then that employees information is still relevant and still part of the data.
When I say the database is timeless I mean that it does not matter at what time you query it, it will give the same result for the same query within reason. The "effective date" is one of the parameters of most queries (although many queries have an implied effective date of today, eg GetCurrentStaff).
It seems to me that too many database are only relevant for right now. Putting an Inactive bit column on a staff member is the classic example. In many databases we might put someone as inactive if they have long service leave. This to me means they have been inactive for all of time and then when they return they magically become active for all of time.
16 februarie 2012 21:50
While your approach is completely valid, you need to keep in mind that it makes design and queries more complicated. Nothing wrong with that as long as it fits into use cases and business requirements. Adding extra complexity to the system that does't require it now and possibly would never require it is not the best way to accomplish things.
Based on my experience most part of OLTP systems that handle operational activity don't care much about "history". You generally don't care when stuff member become active if you need to assign some task to him/her now. You generally don't care how many stuff members you had year ago when you deal with your current problems. You can be interested in those details but for such questions you usually build analysis system - OLAP, Data Warehouses. And those systems rarely share the data and schema with OLTP data and activities.
So my take is - concept in general is great but I would not implement it unless I have such requirements. I would try to make something in the system that allow me to refactor it in case if those use cases become important but I would not build them from the beginning. For example, audit/history/change tracking table with INACTIVE column in the main table would work just fine and could be the source of the data if I ever need to refactor the database to become "database with historical data".
My blog: http://aboutsqlserver.com
16 februarie 2012 21:57One of the other key things I forgot to mention is that you can set information in the future. So if an employee is leaving at the end of the year we don't have to wait until that date to delete them, you just marked them as having finished their employement period as of the end of the year. Or if they are to get a promotion next year then you could mark them as being a manager from the start of next year. This is very different from a database which stores history.
7 martie 2012 22:53
I can't really comment on the work you do but in my case the timing of things is very relevant. Not only do we need to look back in time but we also allocate into the future. As to how many systems in the wild also need to look into the past and/or future I'm not sure.
With regards the complexity issue I agree it adds complexity, I did have to deal with a few issues. However it's not too difficult to add in some functions that wrap up the information for you. For example, I have created a CurrentStaff udf in my database. Also, wouldn't creating a data warehouse as a second database be more complex than having a single database? The other thing is that querying historical data is usually messy and difficult because we have the data in the main table and historical data in a different table, so things can actually become easier in some cases.
With regards to the INACTIVE column I'm not sure I see that as a good solution ever. Surely when a staff member becomes inactive we know ahead of time. In most databases you would need to wait until that day to mark them as inactive. If the staff member who maintains this gets in at 9 then staff who get in at 8 still have 1 hour to allocate something against this inactive staff member. In comparison, marking them as inactive against a future date seems a lot more logical to me.
8 martie 2012 13:35
I have done a similiar design in the past and is indeed effective if you need to query current and historical data together frequently. one downside we ran into was that the larger amount of data did have some performance impacts as the tables grew bigger. Since getting current data quickly was important to us, we used an active/inactive computed column (based on date columns) and partitioned on that column. All of the "current" views then filtered on this column.
Dave Frommer - BI Architect - Independent
9 martie 2012 11:18Right term is "temporal" See http://en.wikipedia.org/wiki/Temporal_database for history of the topic development.
12 martie 2012 00:41
Right term is "temporal" See http://en.wikipedia.org/wiki/Temporal_database for history of the topic development.
There's been a lot of good points raised here but this is the answer I was looking for. Thanks Serg.