Help to Design Calendar or Fact Table in Such a Way I can count no. of Journey for Each service within a day or range wise
-
venerdì 24 febbraio 2012 03:42
Hi All,
Here By I am providing you My Scenario, Kindly Help If you have solution.
A Bus service can have multiple Vehicle journey within a day over different Journey path,
I have a Service table in which start date and end date of service is defined.
Service
ServiceId(PK), name, startdate, enddate
Vehicle Journey Table defines time table for each vehicle journey for this service
Vehicle Journey
VehicleJourneyId(PK),
RouteId,
ServiceId,
StartTime
EndTime
DayType defines that service will operate on which days.
DayType
DayTypeID (PK)
DayName
e.g. of Day Names
Monday,TuesDay,Wed,Thu,Fri,Sat,Sun,Mon-Fri,sat-sun,Not monday,Not Tuesday,Not NewYear, Not Cristmas
e.g Service will operate mon-fri day between 1-1-2012 to 31-1-2012.
another table Contains Service,vehicle journey and Day Type Mappings.
ServiceDayTypeMapping
ServiceDaytypeMapppingId
ServiceId
VehicleJourneyId
DayTypeId
A service can have multiple vehicle journeys , A journey can operate on Defined DayTypes such as(Mon-Fri) or (Not Sunday, means all days without sunday).
I have Date Range Table in which i have created All dates as range and defined daytype inside that.
DATERANGE
DataRangeId
StartDate
EndDate
DayTypeID
e.g. This table can contain start date 1-1-2012 to end date: 1-1-2012 daytpe 7 (sunday),
2-1-2012 to 2-1-2012 DayType 1 (Monday)
6-1-2012 to 10-1-2012 DayType 10 (Mon-Fri)
13-1-2012 to 17-1-2012 DayType 10 (Mon-Fri)
This Schema exist in my RDBMS and i need to design DataWare House schema such that, I can query about how many vehicle journey will operate on each date ,between specified date range, or which vehicle journeys will operate on specified date if date is within range of service start and end date.
I don't have date for each record of Scheduled vehicle journey for service ,service is defined over weekday, and start and end date of service is given.and calendar is as shown as above so how can i design my DW , so i can count Daily scheduled Vehcle journey for all services. or i can count vehicle journey for a service over date range.?
I have study Snowflake schema to relate my Calendar dimension and M:M relation ship table servicedaytypemapping , but not getting exact idea to do this. so please guide to resolve this issue.
Thanks & Regards.
Tutte le risposte
-
venerdì 24 febbraio 2012 07:33
Your above schema is an OLTP type schema - very normalized, and appropriate for single-record updates.
To craft a data warehouse schema, you need to first determine what "measures" you want to aggregate - in your case, you've defined this quite clearly - vehicle journeys. You've also defined that you need to be able to filter those facts by date. You may (and probably do) wish to filter/slice those vehicle journeys by other attributes - service name, for example.
At this point, you know that you need to generate a "factless" fact table with one row per vehicle journey, with no column containing a measurement. (If it makes it clearer, you could have a single column in this table that always contains a value of 1 - each row represents one vehicle journey.) This fact table will have (at least) one other column with a foreign key to a date dimension table, so that you can filter/slice by date. You may have other foreign key columns - perhaps to a service dimension, or vehicle dimension.
In effect, your data warehouse is a denormalized "materialized" schedule built from the definitions you define using your stated OLTP schema. If any data changes in your OLTP tables, you will have to rebuild all or part of the data warehouse to "rematerialize" the "schedule".
There is no need (that I can see) for M:M relationships. You need to completely break free of trying to impose a schema on top of what you already have, and start from scratch from data warehousing principles. Define your facts first, then dimensions, then a process to fill and update the warehouse schema from the OLTP schema.

Talk to me now on

-
venerdì 24 febbraio 2012 15:12
Your above schema is an OLTP type schema - very normalized, and appropriate for single-record updates.
To craft a data warehouse schema, you need to first determine what "measures" you want to aggregate - in your case, you've defined this quite clearly - vehicle journeys. You've also defined that you need to be able to filter those facts by date. You may (and probably do) wish to filter/slice those vehicle journeys by other attributes - service name, for example.
At this point, you know that you need to generate a "factless" fact table with one row per vehicle journey, with no column containing a measurement. (If it makes it clearer, you could have a single column in this table that always contains a value of 1 - each row represents one vehicle journey.) This fact table will have (at least) one other column with a foreign key to a date dimension table, so that you can filter/slice by date. You may have other foreign key columns - perhaps to a service dimension, or vehicle dimension.
In effect, your data warehouse is a denormalized "materialized" schedule built from the definitions you define using your stated OLTP schema. If any data changes in your OLTP tables, you will have to rebuild all or part of the data warehouse to "rematerialize" the "schedule".
There is no need (that I can see) for M:M relationships. You need to completely break free of trying to impose a schema on top of what you already have, and start from scratch from data warehousing principles. Define your facts first, then dimensions, then a process to fill and update the warehouse schema from the OLTP schema.

Talk to me now on

Hi Todd McDermid,
Thanks for your reply i have created fact table which has relation with
FactSchJourney
[ID] ,[ServiceID],[VehicleJourneyUID],[JourneyPatternUID] ,[OperatorId],[DirectionID] , [IsOperatre] ,[DayTypeID]
DimDayType
DayTypeId, DayType
DimDateRange
[StartDate],[EndDate],[DayTypeID],[WeekNumber],[DateRangeID]
ServiceDayTypeMapping (HelperTable)
[ServiceDayTypeID],[ServiceID],[DayTypeID],[JourneyPatternUID] ,[VehicleJourneyUID],[IsOperate]
so can i use snow flaking on the this DimDayType,DimDateRange,ServiceDayTypemapping so i can Get Count of Each vehicle journey specific to service on date bases.
my date range contains all dates as well as some other date ranges specific to day type.
I have also start and end date of service date must be within that range.
So please repl and guide to resolve this scenario.
Regard
Mubin
-
venerdì 24 febbraio 2012 17:29
I do not agree with your DimDateRange dimension. It doesn't support your stated goal of being able to determine the number of vehicle journeys per day. Instead, you should have a DimDate dimension that has one row per day. I would denormalize (not snowflake) the Day and DayType tables into that one DimDate table.
I would also offer the following advice:
I would remove the "IsOperate" column from the fact table. It's not a fact, it's an attribute, and belongs in a dimension table.
Make sure your dimension tables are using a surrogate key - some size of integer (tinyint, smallint, int, bigint).
Do not append "ID" to your surrogate keys in your dimension tables if "ID" is used in the column names of the business keys of the source system. (It's a little unclear if your first post's proposed schema was what your source system's schema was, or if it was just your first proposal for a dimensional model of an undescribed source system schema.) If "ID" is used in the column names of business keys (or anywhere in your source system), I would instead PREpend (add to the start) the letters "dimkey", "key", "sk", or something similar to identify that column as the "surrogate key".
What are "UID" columns, and how are they different from "ID" columns? If "UID" refers to a column containing a GUID, those columns should NOT be in the fact table, they should be in a dimension table. The fact table should have a surrogate key (some size of INT) instead.
Using surrogate keys properly allows for Type 2 slowly changing dimensions, if you have any need for them. It also makes joins a LOT faster.

Talk to me now on

- Modificato Todd McDermidMVP venerdì 24 febbraio 2012 17:30
- Proposto come risposta Jerry NeeModerator lunedì 27 febbraio 2012 03:05
-
sabato 25 febbraio 2012 16:58
"At this point, you know that you need to generate a "factless" fact table with one row per vehicle journey, with no column containing a measurement. (If it makes it clearer, you could have a single column in this table that always contains a value of 1 - each row represents one vehicle journey.) This fact table will have (at least) one other column with a foreign key to a date dimension table, so that you can filter/slice by date. You may have other foreign key columns - perhaps to a service dimension, or vehicle dimension."
Thanks for your reply,
If i have only week day id in fact table which is related to service and vehicle journey , then is it possible to count no.of journeys for each date. or do i need to denormalize journeys for all dates for specified date range of service( start and end date) and according daytype(Mon-Fri) of vehicle journey. and exclude some dates on which bank or public holidays were there. (what i mean to say that similar to TimeTable of journey) which has all dates and time on which vehicle journeys of all services operate.
So what is your thought?
Regards,
Mubin

