Organizing a large dwh
-
Tuesday, July 28, 2009 9:15 AMI'm implementing a dwh in SQL Server 20 05.
I must manage some dimension tables and one fact table with many columns.
Moreover, I must load daily 80.000-100.000 rows into fact table.
I think to structure my db in this way:
- the transaction log file on disk C;
- the mdf file with the system objects on disk D and on filegroup FG1;
- the secondary ndf1 file with the dimension tables on disk E on filegroup FG2 (the default filegroup);
- the secondary ndf2 file with the fact table on disk F on filegroup FG3.
Is it useful to create a ndf file and a related filegroup for the indexes or can I use the ndf1 file?
I think to structure the fact table using the partitions: an active partition on a secondary ndf file into a related filegroup
on the same disk of the dimension tables (E) and an aged partition on the ndf on the related filegroup on disk F.
The active partition should contain the rows related to the last 18 months for querying.
I think to load daily the 80.000-100.000 rows in a staging table and the switching it into the active partition of the fact table.
Is it a good strategy for my dwh?
Thanks
All Replies
-
Saturday, August 01, 2009 1:07 PM
Whilst I think you are probably heading in the right direction you need to consider where you want the best performance - this is probably for users querying your Fact and dimension tables, but load and staging performance is possibly an issue.
I wouldn't have thought a 100k rows a day would cause too many problems during the load but it all depends on how many transformations you are doing and how long you have to load the data.You don't say how many physical disks you have and the type of RAID configuration you are using - this is crucial. If you only have 4 disks and they are all alike then I think I'd be tempted to create a single RAID 5 partition and put everything on it. You would get better performance from a RAID 0 striped set but you would have no disk redundency.If you have many disks in striped arrays then I would ensure that my main warehouse files were on the one with the most disks. It is good practice to keep data and log files seperate from data, however if there staging is not an issue and the warehouse is only ever queried during the day your log file won't be being accessed normally except during the load process.Bulk loading data into an empty partition and using partition switching is fast but probably not necassary for your volume of data.If you explain how many physical disks, sizes speeds etc I could offer more help.- Marked As Answer by Jinchun ChenMicrosoft Employee, Moderator Tuesday, August 25, 2009 9:09 AM

