locked
Fact and dimension table partition RRS feed

  • Question

  • My team is implementing new data-warehouse. I would like to know that when  should we plan to do partition of fact and dimension table, before data comes in or after?




    Tuesday, June 24, 2014 4:15 PM

Answers

  • Hi,

    It is recommended to partition Fact table (Where we will have huge data). Automate the partition so that each day it will create a new partition to hold latest data (Split the previous partition into 2). Best practice is to create partition on transaction timestamps so load the incremental data into a empty table called (Table_IN) and then Switch that data into main table (Table). Make sure your tables (Table and Table_IN) should be on one file group.

    Refer below content for detailed info

    Designing and Administrating Partitions in SQL Server 2012

    A popular method of better managing large and active tables and indexes is the use of partitioning. Partitioning is a feature for segregating I/O workload within SQL Server database so that I/O can be better balanced against available I/O subsystems while providing better user response time, lower I/O latency, and faster backups and recovery. By partitioning tables and indexes across multiple filegroups, data retrieval and management is much quicker because only subsets of the data are used, meanwhile ensuring that the integrity of the database as a whole remains intact.

    Tip

    Partitioning is typically used for administrative or certain I/O performance scenarios. However, partitioning can also speed up some queries by enabling lock escalation to a single partition, rather than to an entire table. You must allow lock escalation to move up to the partition level by setting it with either the Lock Escalation option of Database Options page in SSMS or by using the LOCK_ESCALATION option of the ALTER TABLE statement.

    After a table or index is partitioned, data is stored horizontally across multiple filegroups, so groups of data are mapped to individual partitions. Typical scenarios for partitioning include large tables that become very difficult to manage, tables that are suffering performance degradation because of excessive I/O or blocking locks, table-centric maintenance processes that exceed the available time for maintenance, and moving historical data from the active portion of a table to a partition with less activity.

    Partitioning tables and indexes warrants a bit of planning before putting them into production. The usual approach to partitioning a table or index follows these steps:

    1. Create the filegroup(s) and file(s) used to hold the partitions defined by the partitioning scheme.

    2. Create a partition function to map the rows of the table or index to specific partitions based on the values in a specified column. A very common partitioning function is based on the creation date of the record.

    3. Create a partitioning scheme to map the partitions of the partitioned table to the specified filegroup(s) and, thereby, to specific locations on the Windows file system.

    4. Create the table or index (or ALTER an existing table or index) by specifying the partition scheme as the storage location for the partitioned object.

    Although Transact-SQL commands are available to perform every step described earlier, the Create Partition Wizard makes the entire process quick and easy through an intuitive point-and-click interface. The next section provides an overview of using the Create Partition Wizard in SQL Server 2012, and an example later in this section shows the Transact-SQL commands.

    Leveraging the Create Partition Wizard to Create Table and Index Partitions

    The Create Partition Wizard can be used to divide data in large tables across multiple filegroups to increase performance and can be invoked by right-clicking any table or index, selecting Storage, and then selecting Create Partition. The first step is to identify which columns to partition by reviewing all the columns available in the Available Partitioning Columns section located on the Select a Partitioning Column dialog box, as displayed in Figure 3.13. This screen also includes additional options such as the following:

    Image

    Figure 3.13. Selecting a partitioning column.

    The next screen is called Select a Partition Function. This page is used for specifying the partition function where the data will be partitioned. The options include using an existing partition or creating a new partition. The subsequent page is called New Partition Scheme. Here a DBA will conduct a mapping of the rows selected of tables being partitioned to a desired filegroup. Either a new partition scheme should be used or a new one needs to be created. The final screen is used for doing the actual mapping. On the Map Partitions page, specify the partitions to be used for each partition and then enter a range for the values of the partitions. The ranges and settings on the grid include the following:

    Note

    By opening the Set Boundary Values dialog box, a DBA can set boundary values based on dates (for example, partition everything in a column after a specific date). The data types are based on dates.

    Designing table and index partitions is a DBA task that typically requires a joint effort with the database development team. The DBA must have a strong understanding of the database, tables, and columns to make the correct choices for partitioning. For more information on partitioning, review Books Online.

    Enhancements to Partitioning in SQL Server 2012

    SQL Server 2012 now supports as many as 15,000 partitions. When using more than 1,000 partitions, Microsoft recommends that the instance of SQL Server have at least 16Gb of available memory. This recommendation particularly applies to partitioned indexes, especially those that are not aligned with the base table or with the clustered index of the table. Other Data Manipulation Language statements (DML) and Data Definition Language statements (DDL) may also run short of memory when processing on a large number of partitions.

    Certain DBCC commands may take longer to execute when processing a large number of partitions. On the other hand, a few DBCC commands can be scoped to the partition level and, if so, can be used to perform their function on a subset of data in the partitioned table.

    Queries may also benefit from a new query engine enhancement called partition elimination. SQL Server uses partition enhancement automatically if it is available. Here’s how it works. Assume a table has four partitions, with all the data for customers whose names begin with R, S, or T in the third partition. If a query’s WHERE clause filters on customer name looking for ‘System%’, the query engine knows that it needs only to partition three to answer the request. Thus, it might greatly reduce I/O for that query. On the other hand, some queries might take longer if there are more than 1,000 partitions and the query is not able to perform partition elimination.

    Finally, SQL Server 2012 introduces some changes and improvements to the algorithms used to calculate partitioned index statistics. Primarily, SQL Server 2012 samples rows in a partitioned index when it is created or rebuilt, rather than scanning all available rows. This may sometimes result in somewhat different query behavior compared to the same queries running on SQL Server 2012.

    Administrating Data Using Partition Switching

    Partitioning is useful to access and manage a subset of data while losing none of the integrity of the entire data set. There is one limitation, though. When a partition is created on an existing table, new data is added to a specific partition or to the default partition if none is specified. That means the default partition might grow unwieldy if it is left unmanaged. (This concept is similar to how a clustered index needs to be rebuilt from time to time to reestablish its fill factor setting.)

    Switching partitions is a fast operation because no physical movement of data takes place. Instead, only the metadata pointers to the physical data are altered.

    You can alter partitions using SQL Server Management Studio or with the ALTER TABLE...SWITCH Transact-SQL statement. Both options enable you to ensure partitions are well maintained. For example, you can transfer subsets of data between partitions, move tables between partitions, or combine partitions together. Because the ALTER TABLE...SWITCH statement does not actually move the data, a few prerequisites must be in place:

    • Partitions must use the same column when switching between two partitions.

    • The source and target table must exist prior to the switch and must be on the same filegroup, along with their corresponding indexes, index partitions, and indexed view partitions.

    • The target partition must exist prior to the switch, and it must be empty, whether adding a table to an existing partitioned table or moving a partition from one table to another. The same holds true when moving a partitioned table to a nonpartitioned table structure.

    • The source and target tables must have the same columns in identical order with the same names, data types, and data type attributes (length, precision, scale, and nullability). Computed columns must have identical syntax, as well as primary key constraints. The tables must also have the same settings for ANSI_NULLS and QUOTED_IDENTIFIER properties. Clustered and nonclustered indexes must be identical. ROWGUID properties and XML schemas must match. Finally, settings for in-row data storage must also be the same.

    • The source and target tables must have matching nullability on the partitioning column. Although both NULL and NOT NULL are supported, NOT NULL is strongly recommended.

    Likewise, the ALTER TABLE...SWITCH statement will not work under certain circumstances:

    • Full-text indexes, XML indexes, and old-fashioned SQL Server rules are not allowed (though CHECK constraints are allowed).

    • Tables in a merge replication scheme are not allowed. Tables in a transactional replication scheme are allowed with special caveats. Triggers are allowed on tables but must not fire during the switch.

    • Indexes on the source and target table must reside on the same partition as the tables themselves.

    • Indexed views make partition switching difficult and have a lot of extra rules about how and when they can be switched. Refer to the SQL Server Books Online if you want to perform partition switching on tables containing indexed views.

    • Referential integrity can impact the use of partition switching. First, foreign keys on other tables cannot reference the source table. If the source table holds the primary key, it cannot have a primary or foreign key relationship with the target table. If the target table holds the foreign key, it cannot have a primary or foreign key relationship with the source table.

    In summary, simple tables can easily accommodate partition switching. The more complexity a source or target table exhibits, the more likely that careful planning and extra work will be required to even make partition switching possible, let alone efficient.

    Here’s an example where we create a partitioned table using a previously created partition scheme, called Date_Range_PartScheme1. We then create a new, nonpartitioned table identical to the partitioned table residing on the same filegroup. We finish up switching the data from the partitioned table into the nonpartitioned table:

    CREATE TABLE TransactionHistory_Partn1 (Xn_Hst_ID int, Xn_Type char(10)) ON Date_Range_PartScheme1 (Xn_Hst_ID) ; GO CREATE TABLE TransactionHistory_No_Partn (Xn_Hst_ID int, Xn_Type char(10)) ON main_filegroup ; GO ALTER TABLE TransactionHistory_Partn1 SWITCH partition1 TO TransactionHistory_No_Partn; GO

    The next section shows how to use a more sophisticated, but very popular, approach to partition switching called a sliding window partition.

    Example and Best Practices for Managing Sliding Window Partitions

    Assume that our AdventureWorks business is booming. The sales staff, and by extension the AdventureWorks2012 database, is very busy. We noticed over time that the TransactionHistory table is very active as sales transactions are first entered and are still very active over their first month in the database. But the older the transactions are, the less activity they see. Consequently, we’d like to automatically group transactions into four partitions per year, basically containing one quarter of the year’s data each, in a rolling partitioning. Any transaction older than one year will be purged or archived.

    The answer to a scenario like the preceding one is called a sliding window partition because we are constantly loading new data in and sliding old data over, eventually to be purged or archived. Before you begin, you must choose either a LEFT partition function window or a RIGHT partition function window:

    1. How data is handled varies according to the choice of LEFT or RIGHT partition function window:

    • With a LEFT strategy, partition1 holds the oldest data (Q4 data), partition2 holds data that is 6- to 9-months old (Q3), partition3 holds data that is 3- to 6-months old (Q2), and partition4 holds recent data less than 3-months old.

    • With a RIGHT strategy, partition4 holds the holds data (Q4), partition3 holds Q3 data, partition2 holds Q2 data, and partition1 holds recent data.

    • Following the best practice, make sure there are empty partitions on both the leading edge (partition0) and trailing edge (partition5) of the partition.

    • RIGHT range functions usually make more sense to most people because it is natural for most people to to start ranges at their lowest value and work upward from there.

    2. Assuming that a RIGHT partition function windows is used, we first use the SPLIT subclause of the ALTER PARTITION FUNCTIONstatement to split empty partition5 into two empty partitions, 5 and 6.

    3. We use the SWITCH subclause of ALTER TABLE to switch out partition4 to a staging table for archiving or simply to drop and purge the data. Partition4 is now empty.

    4. We can then use MERGE to combine the empty partitions 4 and 5, so that we’re back to the same number of partitions as when we started. This way, partition3 becomes the new partition4, partition2 becomes the new partition3, and partition1 becomes the new partition2.

    5. We can use SWITCH to push the new quarter’s data into the spot of partition1.

    Tip

    Use the $PARTITION system function to determine where a partition function places values within a range of partitions.

    Some best practices to consider for using a slide window partition include the following:

    • Load newest data into a heap, and then add indexes after the load is finished. Delete oldest data or, when working with very large data sets, drop the partition with the oldest data.

    • Keep an empty staging partition at the leftmost and rightmost ends of the partition range to ensure that the partitions split when loading in new data, and merge, after unloading old data, do not cause data movement.

    • Do not split or merge a partition already populated with data because this can cause severe locking and explosive log growth.

    • Create the load staging table in the same filegroup as the partition you are loading.

    • Create the unload staging table in the same filegroup as the partition you are deleting.

    • Don’t load a partition until its range boundary is met. For example, don’t create and load a partition meant to hold data that is one to two months older before the current data has aged one month. Instead, continue to allow the latest partition to accumulate data until the data is ready for a new, full partition.

    • Unload one partition at a time.

    • The ALTER TABLE...SWITCH statement issues a schema lock on the entire table. Keep this in mind if regular transactional activity is still going on while a table is being partitioned.

     

     


    Thanks Shiven:) If Answer is Helpful, Please Vote


    Wednesday, June 25, 2014 3:12 AM
  • My team is implementing new data-warehouse. I would like to know that when  should we plan to do partition of fact and dimension table, before data comes in or after?

    Hi Tejaskumar,

    In SQL Server Analysis Srevices cube, Partitions divide a table into logical parts. Each partition can then be processed (Refreshed) independent of other partitions. Partition divide table data into pogical parts based on the data. Here are some links about basic information of Partitions for you reference.
    http://msdn.microsoft.com/en-us/library/hh230976.aspx
    http://www.mssqltips.com/sqlservertip/1549/how-to-define-measure-group-partitions-in-sql-server-analysis-services-ssas/

    If you have any questions, please feel free to ask.

    Regards,


    Charlie Liao
    TechNet Community Support

    • Marked as answer by Charlie Liao Wednesday, July 2, 2014 9:58 AM
    Wednesday, June 25, 2014 3:13 AM
  • 1.1       What is Partition?

    When a database table grows in size to the hundreds of gigabytes or more, it can become more difficult to load new data, remove old data, and maintain indexes. Just the sheer size of the table causes such operations to take much longer. Even the data that must be loaded or removed can be very sizable, making INSERT and DELETE operations on the table impractical. The Microsoft SQL Server 2005 & higher version database software provides table partitioning to make such operations more manageable.

     

    Partitioning a large table divides the table and its indexes into smaller partitions, so that maintenance operations can be applied on a partition-by-partition basis, rather than on the entire table. In addition, the SQL Server optimizer can direct properly filtered queries to appropriate partitions rather than the entire table

    1.2       Benefits of Partitioning in SQL Server

    1. Manageability– Manageability of partition table/index became easier as you can rebuild/re-organize indexes of each partition separately. You can manage each partition separately; you can take a back-up of only the file-groups that contain partitions having volatile data etc.
    2. Query Performance– The query optimizer uses techniques to optimize and improve the query performance. For example,
      • Partition elimination – Partition Elimination is a technique used by query optimizer to not consider partitions that don’t contain data requested by the query. For example, if a query requests data for only the years 2010 and 2011, in that case only two partitions will be considered during query optimization and execution unlike a single large table where query optimizer will consider the whole dataset; the other partitions (2008, 2009 and 2012) will be simply ignored.
      • Parallel Processing – Query Optimizer uses a technique to process each partition in parallel or even multiple CPU cores can work together to work on a single partition. With this, the query optimizer tries to utilize modern hardware resources efficiently. For example, if a query requests data for only the years 2010 and 2011, in that case only two partitions will be considered during query optimization and suppose if you have 8 cores machine, all 8 cores can work together to produce the result from the two identified partitions.
    3. Indexes– You can have different settings (FILLFACTOR) or different numbers of indexes for each partition of a table. For example, the most recent year partition will have volatile data and will be both read and write intensive data and used by OLTP applications and hence you should have the minimum number of indexes, whereas older partitions will have mostly read only data and be used by Analytical applications and hence you can create more indexes to make your analytical queries run faster.
    4. Compression– Compression is new feature introduced with SQL Server 2008. It minimizes the need for storage space at the cost of additional CPU cycles whenever data is read or written. Again, the most recent year partition will have volatile data and be accessed frequently so ideally you should not compress it, whereas the older partitions will not be accessed frequently and hence you can compress them to minimize the storage space requirement.
    5. Minimized time for Backup/Restore– For a large table, normally only the latest few partitions will be volatile and hence you can take a regular backup of the file group (read-write) that contains this volatile data whereas you can take occasional backups of the file group (read-only) that contains non-volatile data. This way, we can minimize the downtime window and reduce the backup and restore time.
    6. Loading data to/from is fast– Data load in the partition table takes only seconds, instead of the minutes or hours of operation when you have a non-partitioned table, using a technique called SWITCH-IN.
    7. Data archival– Data archival from a partitioned table again takes only seconds, instead of the minutes or hours of operation when you have a non-partitioned table, using a technique called SWITCH-OUT. 

    1.1       Points need to consider

    1. Partition Key can’t be Modify or Alter once Table/Indexes are partitioned.

     

    For Example: A table has been partitioned on column DATE_KEY (Data Type INT, NOT NULL). If try to alter the data type, length, check constraint of column DATE_KEY, It will not allow.

     

    1. Make sure Partition Key is NOT NULL Column & Check Constraint NOT NULL is defined for Partition Key while creating Table. 

    The reason to do so is, if you create a cluster index on any column/columns on Partition table, by default, it will include Partition key in Cluster Index and if your partition key is defined as NULL, then you can’t create cluster index on your partition table because it is not allowed to include the column for which Check Constraint is defined as NULL in Cluster Index.

     

    For Example:

    For table [ODS].[FINE_MASTER] , We are pulling data based on [LAST_UPDATE]. This Column is defined as NULL in table [ODS].[ FINE_MASTER]. I did partition on Table [ODS].[ FINE_MASTER] on key [LAST_UPDATE].  We have one Cluster index (PK_SERIAL_NO_EMP_ID) on this table on columns [SERIAL_NO] &[EMP_ID]. After partition this table I tried to create this Cluster Index but was not allowed because by default it includes column [LAST_UPDATE] in above cluster index but we can’t create Cluster Index on NULL defined columns.

     

    1. MERGE & SPLIT Partition Function/Scheme will lock all the tables which reside on this Partition Function/Scheme.

     

    For Example: We have table(s) (T1...) on a Partition Scheme (PS) & we are trying to MERGE Last 2 partitions which are having data and this MERGE operation is taking 10 minutes. So these 10 minutes, above table(s) cannot be accessed.

     

    1. MERGE operation will be faster when merging an empty partition to a non-empty or an empty partition.

     

    For Example: After SWITCH OUT, your last partition will be empty & 2<sup>nd</sup> last partition will have data. If you merge last (Empty) & 2<sup>nd</sup> last (Non-empty) partitions, it will be done in fraction of seconds.

     

    1. SPLIT operation will be faster when SPLITING a non-empty partition to an empty partition.

     

    For Example: Your latest partition is having data & you are SPLITING latest partition to an empty partition to hold today’s/next day data. This operation will be done in fraction of seconds.

     

    1. SWITICH operations (SWITCH IN & SWITCH OUT) will work only when all the 3 tables (T1_IN, T1, and T1_OUT) are having same structure, indexes and reside on same file group(s) and destination partition is empty.

    Due to any reason, If you alter your main table (T1), then same alter has to be applied across tables T1_IN & T1_OUT.

     

    For Example: If you SWITCH OUT your last partition from main table T1 to table T1_OUT, then for successful SWITCH OUT, table T1_OUT has to be empty.

     

    Switch operations can be implemented in 2 ways:

     

    Method-1:  This method will allow you to load more than one day data to your T1_IN table at one time & can switch in those one by one by using loop. Same way, you can archive more than a day data to T1_OUT & truncate the T1_OUT. There is a disadvantage, if any lock happening at function or scheme level then it will impact all 3 tables (T1_IN, T1, and T1_OUT).

     

    Partition all the 3 tables (T1_IN, T1, and T1_OUT). So these tables will have same structure, indexes and reside on same partition scheme & partition function.

    • SWITCH OUT Old data to T1_OUT table & then pull the data from T1_OUT table to Archive DB & table & then truncate the T1_OUT & then merge the oldest empty partition to 2<sup>nd</sup> oldest partition. Please don’t merge before truncating the data from T1_OUT else it will try to merge 2 partitions which are having data & It will lock all 3 tables during this merge operation. With this method, you can switch out many partitions to T1_OUT table by using loop & this switching you can do weekly.
    • Same way, we need to partition our T1_IN table. It is important to note that before loading the data into T1_IN table, we have to SPLIT latest partition to hold new data and then load the new data into T1_IN & then SWITCH IN. If you don’t split before loading & your latest partition can hold more than one day data & then if you try to SPLIT, then it will take more time to split non empty partition to non-empty partition & during this split operation it will lock all the 3 tables.

     

    Method-2:  This method will allow you to load one day data to your T1_IN table at one time & can switch in. Same way, you can archive a day data to T1_OUT. There is a disadvantage, suppose your table T1_IN got more than 1 boundary values data & then if you try switch in, it will fail. But if any locks happen during load to T1_IN, will not impact the table T1.

     

    Partition only table T1 and tables T1_IN and T1_OUT are non-partitioned tables. But these tables will have same structure, indexes and reside on same file group(s).

    • SWITCH OUT Old one day data to T1_OUT (Not Partition) table & then pull the data from T1_OUT table to Archive DB & table & then truncate the T1_OUT & then again switch out another old day data. At the end, you can merge all the empty partition in table T1.
    • Load the only one day data to T1_IN (Not Partition) table and then Switch in to T1 (Make sure before switching the data, you have SPLITED the latest partition to hold new data).

     

    1. Need to build strategy to move data from table T1_OUT (Production DB) to Archive DB on same day because next day, data from T1_OUT will be deleted (To Make Partition Empty in T1_OUT for successful SWITCH OUT) before doing SWITCH OUT operation.

     

    Note: SWITCH will not work between Production DB & Archive DB, because both are on different server, Schema & File group. So we need to archive (Move Data from T1_OUT Production DB to Archive DB) data with other strategy.

     

    1. LEFT PARTITION:

    CREATE PARTITION FUNCTION myRangePF1 (int) AS RANGE LEFT FOR VALUES (1, 100, 1000);

     

    The following table shows how a table that uses this partition function on partitioning column col1 would be partitioned.

    Partition

    1

    2

    3

    4

    Values

    col1 < =1

    col1> 1 

    AND col1 <= 100

    col1 >100 

    AND col1 <=1000

    col1> 1000

    When creating LEFT PARTITION FUNCTION on a DATETIME column, kindly consider DATE as well as TIME part for defining partition boundaries & this will make sure that one day data will reside in one Partition.

     

    For Example:

    (

    ‘2013-01-31 23:59:59.997’,

     ‘2013-02-01 23:59:59.997’,

    ‘2013-02-02 23:59:59.997’,

    ------------------------------

    ------------------------------

    ‘2013-12-31 23:59:59.997’

    )

    OR

    (

    ‘2013-01-31 23:59: 59.999999’,

    ‘2013-02-01 23:59: 59.999999’,

    ‘2013-02-02 23:59: 59.999999’,

    ------------------------------

    ------------------------------

    ‘2013-12-31 23:59: 59.999999’

    )

     

    RIGHT PARTITION:

    CREATE PARTITION FUNCTION myRangePF2 (int) AS RANGE RIGHT FOR VALUES (1, 100, 1000);

     

    The following table shows how a table that uses this partition function on partitioning column col1 would be partitioned.

    Partition

    1

    2

    3

    4

    Values

    col1 < 1

    col1 >= 1 

    AND col1 < 100

    col1 >= 100 

    AND col1 < 1000

    col1 >= 1000

    RIGHT PARTITION will be useful, when you partition DIMENSION tables where, SWITCH OUT (Archive is not required) is not required.

     

    For example:

    (

    ‘2012-01-01’,

    ‘2012-01-02’,

    ------------------------------

    ------------------------------

    ‘2013-12-31’

    )

     

    You have created 365*2 (To hold 2 Years of data) no of daily Partitions on Dimension table. Suppose you have data more than 2 years in your dimension table then your last partition will have all the data less than 2012-01-01. Now as we want to maintain only 365*2 no of partitions & without archiving, so need to Merge Last 2 partitions & then Split Latest one to hold next day’s data.

     

    Merging Last 2 partitions which are having data:

    Suppose Last Partition P1 is having 1 million records & 2<sup>nd</sup> last partition P2 is having 2, 00,000 records. Now if you fire Merge Statement to Merge P1 & P2 partitions, then with RIGHT Partition data will move from P2 to P1 which will take less time because P2 is having less data compare to P1.

     

    But it will take more time if your partition is LEFT Partitions because it will try to move data from P1 to P2.

     

    If your dimension is SCD (Slowly Changing Dimension) & update is required. So it is better to delete matching records and insert one day latest data instead of updating.

     

    For Example:

    We have update on our table [ODS].[FINE_MASTER] based on columns [SERIAL_NO] &[EMP_ID].

    Suppose 1 million records got loaded in T1_IN table for a day. Out of 1 million 700000 records are old & require update & only 300000 records are new & need to insert in T1 ([ODS].[FINE _MASTER]) table. Instead of updating, I will do INNER JOIN T1_IN table to T1 table based on columns [SERIAL_NO]&[EMP_ID] and then delete matching data from T1 & then do SWITCH IN from T1_IN to T1.

     

    1. ETLs changes are required for all Partition Tables:
      • ETLs need to modify to change destination table from T1 to T1_IN. So ETL will load the Data into T1_IN table (Same Structure as T1).
      • The task which is implemented to delete the data from destination table during re-load of data (Re-Run of ETL) needs to modify. In Delete Script we need to include both the tables T1 & T1_IN. When Re-load (Re-Run of ETL for Old Data) of data required, in that case OUTàMERGEà SPLIT operations are not required, Only SWITCH IN operation is required. This has been handled in below script.
      • Need to include another Execute SQL Task in ETL to handle (automate) SWITCH OUTàMERGEà SPLITàSWITCH IN operations.


    Thanks Shiven:) If Answer is Helpful, Please Vote

    • Marked as answer by Charlie Liao Wednesday, July 2, 2014 9:58 AM
    Wednesday, June 25, 2014 3:26 AM

All replies

  • Hi Tejas,

    Partitioning is always done based on the volume of data.

    Considering the volume is large in case of fact tables, best practice is to partition the fact table in underlying DB and also in the corresponding measure group in Analysis Services.


    Saurabh Kamath

    • Proposed as answer by Charlie Liao Wednesday, June 25, 2014 3:13 AM
    Wednesday, June 25, 2014 2:55 AM
  • Hi,

    It is recommended to partition Fact table (Where we will have huge data). Automate the partition so that each day it will create a new partition to hold latest data (Split the previous partition into 2). Best practice is to create partition on transaction timestamps so load the incremental data into a empty table called (Table_IN) and then Switch that data into main table (Table). Make sure your tables (Table and Table_IN) should be on one file group.

    Refer below content for detailed info

    Designing and Administrating Partitions in SQL Server 2012

    A popular method of better managing large and active tables and indexes is the use of partitioning. Partitioning is a feature for segregating I/O workload within SQL Server database so that I/O can be better balanced against available I/O subsystems while providing better user response time, lower I/O latency, and faster backups and recovery. By partitioning tables and indexes across multiple filegroups, data retrieval and management is much quicker because only subsets of the data are used, meanwhile ensuring that the integrity of the database as a whole remains intact.

    Tip

    Partitioning is typically used for administrative or certain I/O performance scenarios. However, partitioning can also speed up some queries by enabling lock escalation to a single partition, rather than to an entire table. You must allow lock escalation to move up to the partition level by setting it with either the Lock Escalation option of Database Options page in SSMS or by using the LOCK_ESCALATION option of the ALTER TABLE statement.

    After a table or index is partitioned, data is stored horizontally across multiple filegroups, so groups of data are mapped to individual partitions. Typical scenarios for partitioning include large tables that become very difficult to manage, tables that are suffering performance degradation because of excessive I/O or blocking locks, table-centric maintenance processes that exceed the available time for maintenance, and moving historical data from the active portion of a table to a partition with less activity.

    Partitioning tables and indexes warrants a bit of planning before putting them into production. The usual approach to partitioning a table or index follows these steps:

    1. Create the filegroup(s) and file(s) used to hold the partitions defined by the partitioning scheme.

    2. Create a partition function to map the rows of the table or index to specific partitions based on the values in a specified column. A very common partitioning function is based on the creation date of the record.

    3. Create a partitioning scheme to map the partitions of the partitioned table to the specified filegroup(s) and, thereby, to specific locations on the Windows file system.

    4. Create the table or index (or ALTER an existing table or index) by specifying the partition scheme as the storage location for the partitioned object.

    Although Transact-SQL commands are available to perform every step described earlier, the Create Partition Wizard makes the entire process quick and easy through an intuitive point-and-click interface. The next section provides an overview of using the Create Partition Wizard in SQL Server 2012, and an example later in this section shows the Transact-SQL commands.

    Leveraging the Create Partition Wizard to Create Table and Index Partitions

    The Create Partition Wizard can be used to divide data in large tables across multiple filegroups to increase performance and can be invoked by right-clicking any table or index, selecting Storage, and then selecting Create Partition. The first step is to identify which columns to partition by reviewing all the columns available in the Available Partitioning Columns section located on the Select a Partitioning Column dialog box, as displayed in Figure 3.13. This screen also includes additional options such as the following:

    Image

    Figure 3.13. Selecting a partitioning column.

    The next screen is called Select a Partition Function. This page is used for specifying the partition function where the data will be partitioned. The options include using an existing partition or creating a new partition. The subsequent page is called New Partition Scheme. Here a DBA will conduct a mapping of the rows selected of tables being partitioned to a desired filegroup. Either a new partition scheme should be used or a new one needs to be created. The final screen is used for doing the actual mapping. On the Map Partitions page, specify the partitions to be used for each partition and then enter a range for the values of the partitions. The ranges and settings on the grid include the following:

    Note

    By opening the Set Boundary Values dialog box, a DBA can set boundary values based on dates (for example, partition everything in a column after a specific date). The data types are based on dates.

    Designing table and index partitions is a DBA task that typically requires a joint effort with the database development team. The DBA must have a strong understanding of the database, tables, and columns to make the correct choices for partitioning. For more information on partitioning, review Books Online.

    Enhancements to Partitioning in SQL Server 2012

    SQL Server 2012 now supports as many as 15,000 partitions. When using more than 1,000 partitions, Microsoft recommends that the instance of SQL Server have at least 16Gb of available memory. This recommendation particularly applies to partitioned indexes, especially those that are not aligned with the base table or with the clustered index of the table. Other Data Manipulation Language statements (DML) and Data Definition Language statements (DDL) may also run short of memory when processing on a large number of partitions.

    Certain DBCC commands may take longer to execute when processing a large number of partitions. On the other hand, a few DBCC commands can be scoped to the partition level and, if so, can be used to perform their function on a subset of data in the partitioned table.

    Queries may also benefit from a new query engine enhancement called partition elimination. SQL Server uses partition enhancement automatically if it is available. Here’s how it works. Assume a table has four partitions, with all the data for customers whose names begin with R, S, or T in the third partition. If a query’s WHERE clause filters on customer name looking for ‘System%’, the query engine knows that it needs only to partition three to answer the request. Thus, it might greatly reduce I/O for that query. On the other hand, some queries might take longer if there are more than 1,000 partitions and the query is not able to perform partition elimination.

    Finally, SQL Server 2012 introduces some changes and improvements to the algorithms used to calculate partitioned index statistics. Primarily, SQL Server 2012 samples rows in a partitioned index when it is created or rebuilt, rather than scanning all available rows. This may sometimes result in somewhat different query behavior compared to the same queries running on SQL Server 2012.

    Administrating Data Using Partition Switching

    Partitioning is useful to access and manage a subset of data while losing none of the integrity of the entire data set. There is one limitation, though. When a partition is created on an existing table, new data is added to a specific partition or to the default partition if none is specified. That means the default partition might grow unwieldy if it is left unmanaged. (This concept is similar to how a clustered index needs to be rebuilt from time to time to reestablish its fill factor setting.)

    Switching partitions is a fast operation because no physical movement of data takes place. Instead, only the metadata pointers to the physical data are altered.

    You can alter partitions using SQL Server Management Studio or with the ALTER TABLE...SWITCH Transact-SQL statement. Both options enable you to ensure partitions are well maintained. For example, you can transfer subsets of data between partitions, move tables between partitions, or combine partitions together. Because the ALTER TABLE...SWITCH statement does not actually move the data, a few prerequisites must be in place:

    • Partitions must use the same column when switching between two partitions.

    • The source and target table must exist prior to the switch and must be on the same filegroup, along with their corresponding indexes, index partitions, and indexed view partitions.

    • The target partition must exist prior to the switch, and it must be empty, whether adding a table to an existing partitioned table or moving a partition from one table to another. The same holds true when moving a partitioned table to a nonpartitioned table structure.

    • The source and target tables must have the same columns in identical order with the same names, data types, and data type attributes (length, precision, scale, and nullability). Computed columns must have identical syntax, as well as primary key constraints. The tables must also have the same settings for ANSI_NULLS and QUOTED_IDENTIFIER properties. Clustered and nonclustered indexes must be identical. ROWGUID properties and XML schemas must match. Finally, settings for in-row data storage must also be the same.

    • The source and target tables must have matching nullability on the partitioning column. Although both NULL and NOT NULL are supported, NOT NULL is strongly recommended.

    Likewise, the ALTER TABLE...SWITCH statement will not work under certain circumstances:

    • Full-text indexes, XML indexes, and old-fashioned SQL Server rules are not allowed (though CHECK constraints are allowed).

    • Tables in a merge replication scheme are not allowed. Tables in a transactional replication scheme are allowed with special caveats. Triggers are allowed on tables but must not fire during the switch.

    • Indexes on the source and target table must reside on the same partition as the tables themselves.

    • Indexed views make partition switching difficult and have a lot of extra rules about how and when they can be switched. Refer to the SQL Server Books Online if you want to perform partition switching on tables containing indexed views.

    • Referential integrity can impact the use of partition switching. First, foreign keys on other tables cannot reference the source table. If the source table holds the primary key, it cannot have a primary or foreign key relationship with the target table. If the target table holds the foreign key, it cannot have a primary or foreign key relationship with the source table.

    In summary, simple tables can easily accommodate partition switching. The more complexity a source or target table exhibits, the more likely that careful planning and extra work will be required to even make partition switching possible, let alone efficient.

    Here’s an example where we create a partitioned table using a previously created partition scheme, called Date_Range_PartScheme1. We then create a new, nonpartitioned table identical to the partitioned table residing on the same filegroup. We finish up switching the data from the partitioned table into the nonpartitioned table:

    CREATE TABLE TransactionHistory_Partn1 (Xn_Hst_ID int, Xn_Type char(10)) ON Date_Range_PartScheme1 (Xn_Hst_ID) ; GO CREATE TABLE TransactionHistory_No_Partn (Xn_Hst_ID int, Xn_Type char(10)) ON main_filegroup ; GO ALTER TABLE TransactionHistory_Partn1 SWITCH partition1 TO TransactionHistory_No_Partn; GO

    The next section shows how to use a more sophisticated, but very popular, approach to partition switching called a sliding window partition.

    Example and Best Practices for Managing Sliding Window Partitions

    Assume that our AdventureWorks business is booming. The sales staff, and by extension the AdventureWorks2012 database, is very busy. We noticed over time that the TransactionHistory table is very active as sales transactions are first entered and are still very active over their first month in the database. But the older the transactions are, the less activity they see. Consequently, we’d like to automatically group transactions into four partitions per year, basically containing one quarter of the year’s data each, in a rolling partitioning. Any transaction older than one year will be purged or archived.

    The answer to a scenario like the preceding one is called a sliding window partition because we are constantly loading new data in and sliding old data over, eventually to be purged or archived. Before you begin, you must choose either a LEFT partition function window or a RIGHT partition function window:

    1. How data is handled varies according to the choice of LEFT or RIGHT partition function window:

    • With a LEFT strategy, partition1 holds the oldest data (Q4 data), partition2 holds data that is 6- to 9-months old (Q3), partition3 holds data that is 3- to 6-months old (Q2), and partition4 holds recent data less than 3-months old.

    • With a RIGHT strategy, partition4 holds the holds data (Q4), partition3 holds Q3 data, partition2 holds Q2 data, and partition1 holds recent data.

    • Following the best practice, make sure there are empty partitions on both the leading edge (partition0) and trailing edge (partition5) of the partition.

    • RIGHT range functions usually make more sense to most people because it is natural for most people to to start ranges at their lowest value and work upward from there.

    2. Assuming that a RIGHT partition function windows is used, we first use the SPLIT subclause of the ALTER PARTITION FUNCTIONstatement to split empty partition5 into two empty partitions, 5 and 6.

    3. We use the SWITCH subclause of ALTER TABLE to switch out partition4 to a staging table for archiving or simply to drop and purge the data. Partition4 is now empty.

    4. We can then use MERGE to combine the empty partitions 4 and 5, so that we’re back to the same number of partitions as when we started. This way, partition3 becomes the new partition4, partition2 becomes the new partition3, and partition1 becomes the new partition2.

    5. We can use SWITCH to push the new quarter’s data into the spot of partition1.

    Tip

    Use the $PARTITION system function to determine where a partition function places values within a range of partitions.

    Some best practices to consider for using a slide window partition include the following:

    • Load newest data into a heap, and then add indexes after the load is finished. Delete oldest data or, when working with very large data sets, drop the partition with the oldest data.

    • Keep an empty staging partition at the leftmost and rightmost ends of the partition range to ensure that the partitions split when loading in new data, and merge, after unloading old data, do not cause data movement.

    • Do not split or merge a partition already populated with data because this can cause severe locking and explosive log growth.

    • Create the load staging table in the same filegroup as the partition you are loading.

    • Create the unload staging table in the same filegroup as the partition you are deleting.

    • Don’t load a partition until its range boundary is met. For example, don’t create and load a partition meant to hold data that is one to two months older before the current data has aged one month. Instead, continue to allow the latest partition to accumulate data until the data is ready for a new, full partition.

    • Unload one partition at a time.

    • The ALTER TABLE...SWITCH statement issues a schema lock on the entire table. Keep this in mind if regular transactional activity is still going on while a table is being partitioned.

     

     


    Thanks Shiven:) If Answer is Helpful, Please Vote


    Wednesday, June 25, 2014 3:12 AM
  • My team is implementing new data-warehouse. I would like to know that when  should we plan to do partition of fact and dimension table, before data comes in or after?

    Hi Tejaskumar,

    In SQL Server Analysis Srevices cube, Partitions divide a table into logical parts. Each partition can then be processed (Refreshed) independent of other partitions. Partition divide table data into pogical parts based on the data. Here are some links about basic information of Partitions for you reference.
    http://msdn.microsoft.com/en-us/library/hh230976.aspx
    http://www.mssqltips.com/sqlservertip/1549/how-to-define-measure-group-partitions-in-sql-server-analysis-services-ssas/

    If you have any questions, please feel free to ask.

    Regards,


    Charlie Liao
    TechNet Community Support

    • Marked as answer by Charlie Liao Wednesday, July 2, 2014 9:58 AM
    Wednesday, June 25, 2014 3:13 AM
  • 1.1       What is Partition?

    When a database table grows in size to the hundreds of gigabytes or more, it can become more difficult to load new data, remove old data, and maintain indexes. Just the sheer size of the table causes such operations to take much longer. Even the data that must be loaded or removed can be very sizable, making INSERT and DELETE operations on the table impractical. The Microsoft SQL Server 2005 & higher version database software provides table partitioning to make such operations more manageable.

     

    Partitioning a large table divides the table and its indexes into smaller partitions, so that maintenance operations can be applied on a partition-by-partition basis, rather than on the entire table. In addition, the SQL Server optimizer can direct properly filtered queries to appropriate partitions rather than the entire table

    1.2       Benefits of Partitioning in SQL Server

    1. Manageability– Manageability of partition table/index became easier as you can rebuild/re-organize indexes of each partition separately. You can manage each partition separately; you can take a back-up of only the file-groups that contain partitions having volatile data etc.
    2. Query Performance– The query optimizer uses techniques to optimize and improve the query performance. For example,
      • Partition elimination – Partition Elimination is a technique used by query optimizer to not consider partitions that don’t contain data requested by the query. For example, if a query requests data for only the years 2010 and 2011, in that case only two partitions will be considered during query optimization and execution unlike a single large table where query optimizer will consider the whole dataset; the other partitions (2008, 2009 and 2012) will be simply ignored.
      • Parallel Processing – Query Optimizer uses a technique to process each partition in parallel or even multiple CPU cores can work together to work on a single partition. With this, the query optimizer tries to utilize modern hardware resources efficiently. For example, if a query requests data for only the years 2010 and 2011, in that case only two partitions will be considered during query optimization and suppose if you have 8 cores machine, all 8 cores can work together to produce the result from the two identified partitions.
    3. Indexes– You can have different settings (FILLFACTOR) or different numbers of indexes for each partition of a table. For example, the most recent year partition will have volatile data and will be both read and write intensive data and used by OLTP applications and hence you should have the minimum number of indexes, whereas older partitions will have mostly read only data and be used by Analytical applications and hence you can create more indexes to make your analytical queries run faster.
    4. Compression– Compression is new feature introduced with SQL Server 2008. It minimizes the need for storage space at the cost of additional CPU cycles whenever data is read or written. Again, the most recent year partition will have volatile data and be accessed frequently so ideally you should not compress it, whereas the older partitions will not be accessed frequently and hence you can compress them to minimize the storage space requirement.
    5. Minimized time for Backup/Restore– For a large table, normally only the latest few partitions will be volatile and hence you can take a regular backup of the file group (read-write) that contains this volatile data whereas you can take occasional backups of the file group (read-only) that contains non-volatile data. This way, we can minimize the downtime window and reduce the backup and restore time.
    6. Loading data to/from is fast– Data load in the partition table takes only seconds, instead of the minutes or hours of operation when you have a non-partitioned table, using a technique called SWITCH-IN.
    7. Data archival– Data archival from a partitioned table again takes only seconds, instead of the minutes or hours of operation when you have a non-partitioned table, using a technique called SWITCH-OUT. 

    1.1       Points need to consider

    1. Partition Key can’t be Modify or Alter once Table/Indexes are partitioned.

     

    For Example: A table has been partitioned on column DATE_KEY (Data Type INT, NOT NULL). If try to alter the data type, length, check constraint of column DATE_KEY, It will not allow.

     

    1. Make sure Partition Key is NOT NULL Column & Check Constraint NOT NULL is defined for Partition Key while creating Table. 

    The reason to do so is, if you create a cluster index on any column/columns on Partition table, by default, it will include Partition key in Cluster Index and if your partition key is defined as NULL, then you can’t create cluster index on your partition table because it is not allowed to include the column for which Check Constraint is defined as NULL in Cluster Index.

     

    For Example:

    For table [ODS].[FINE_MASTER] , We are pulling data based on [LAST_UPDATE]. This Column is defined as NULL in table [ODS].[ FINE_MASTER]. I did partition on Table [ODS].[ FINE_MASTER] on key [LAST_UPDATE].  We have one Cluster index (PK_SERIAL_NO_EMP_ID) on this table on columns [SERIAL_NO] &[EMP_ID]. After partition this table I tried to create this Cluster Index but was not allowed because by default it includes column [LAST_UPDATE] in above cluster index but we can’t create Cluster Index on NULL defined columns.

     

    1. MERGE & SPLIT Partition Function/Scheme will lock all the tables which reside on this Partition Function/Scheme.

     

    For Example: We have table(s) (T1...) on a Partition Scheme (PS) & we are trying to MERGE Last 2 partitions which are having data and this MERGE operation is taking 10 minutes. So these 10 minutes, above table(s) cannot be accessed.

     

    1. MERGE operation will be faster when merging an empty partition to a non-empty or an empty partition.

     

    For Example: After SWITCH OUT, your last partition will be empty & 2<sup>nd</sup> last partition will have data. If you merge last (Empty) & 2<sup>nd</sup> last (Non-empty) partitions, it will be done in fraction of seconds.

     

    1. SPLIT operation will be faster when SPLITING a non-empty partition to an empty partition.

     

    For Example: Your latest partition is having data & you are SPLITING latest partition to an empty partition to hold today’s/next day data. This operation will be done in fraction of seconds.

     

    1. SWITICH operations (SWITCH IN & SWITCH OUT) will work only when all the 3 tables (T1_IN, T1, and T1_OUT) are having same structure, indexes and reside on same file group(s) and destination partition is empty.

    Due to any reason, If you alter your main table (T1), then same alter has to be applied across tables T1_IN & T1_OUT.

     

    For Example: If you SWITCH OUT your last partition from main table T1 to table T1_OUT, then for successful SWITCH OUT, table T1_OUT has to be empty.

     

    Switch operations can be implemented in 2 ways:

     

    Method-1:  This method will allow you to load more than one day data to your T1_IN table at one time & can switch in those one by one by using loop. Same way, you can archive more than a day data to T1_OUT & truncate the T1_OUT. There is a disadvantage, if any lock happening at function or scheme level then it will impact all 3 tables (T1_IN, T1, and T1_OUT).

     

    Partition all the 3 tables (T1_IN, T1, and T1_OUT). So these tables will have same structure, indexes and reside on same partition scheme & partition function.

    • SWITCH OUT Old data to T1_OUT table & then pull the data from T1_OUT table to Archive DB & table & then truncate the T1_OUT & then merge the oldest empty partition to 2<sup>nd</sup> oldest partition. Please don’t merge before truncating the data from T1_OUT else it will try to merge 2 partitions which are having data & It will lock all 3 tables during this merge operation. With this method, you can switch out many partitions to T1_OUT table by using loop & this switching you can do weekly.
    • Same way, we need to partition our T1_IN table. It is important to note that before loading the data into T1_IN table, we have to SPLIT latest partition to hold new data and then load the new data into T1_IN & then SWITCH IN. If you don’t split before loading & your latest partition can hold more than one day data & then if you try to SPLIT, then it will take more time to split non empty partition to non-empty partition & during this split operation it will lock all the 3 tables.

     

    Method-2:  This method will allow you to load one day data to your T1_IN table at one time & can switch in. Same way, you can archive a day data to T1_OUT. There is a disadvantage, suppose your table T1_IN got more than 1 boundary values data & then if you try switch in, it will fail. But if any locks happen during load to T1_IN, will not impact the table T1.

     

    Partition only table T1 and tables T1_IN and T1_OUT are non-partitioned tables. But these tables will have same structure, indexes and reside on same file group(s).

    • SWITCH OUT Old one day data to T1_OUT (Not Partition) table & then pull the data from T1_OUT table to Archive DB & table & then truncate the T1_OUT & then again switch out another old day data. At the end, you can merge all the empty partition in table T1.
    • Load the only one day data to T1_IN (Not Partition) table and then Switch in to T1 (Make sure before switching the data, you have SPLITED the latest partition to hold new data).

     

    1. Need to build strategy to move data from table T1_OUT (Production DB) to Archive DB on same day because next day, data from T1_OUT will be deleted (To Make Partition Empty in T1_OUT for successful SWITCH OUT) before doing SWITCH OUT operation.

     

    Note: SWITCH will not work between Production DB & Archive DB, because both are on different server, Schema & File group. So we need to archive (Move Data from T1_OUT Production DB to Archive DB) data with other strategy.

     

    1. LEFT PARTITION:

    CREATE PARTITION FUNCTION myRangePF1 (int) AS RANGE LEFT FOR VALUES (1, 100, 1000);

     

    The following table shows how a table that uses this partition function on partitioning column col1 would be partitioned.

    Partition

    1

    2

    3

    4

    Values

    col1 < =1

    col1> 1 

    AND col1 <= 100

    col1 >100 

    AND col1 <=1000

    col1> 1000

    When creating LEFT PARTITION FUNCTION on a DATETIME column, kindly consider DATE as well as TIME part for defining partition boundaries & this will make sure that one day data will reside in one Partition.

     

    For Example:

    (

    ‘2013-01-31 23:59:59.997’,

     ‘2013-02-01 23:59:59.997’,

    ‘2013-02-02 23:59:59.997’,

    ------------------------------

    ------------------------------

    ‘2013-12-31 23:59:59.997’

    )

    OR

    (

    ‘2013-01-31 23:59: 59.999999’,

    ‘2013-02-01 23:59: 59.999999’,

    ‘2013-02-02 23:59: 59.999999’,

    ------------------------------

    ------------------------------

    ‘2013-12-31 23:59: 59.999999’

    )

     

    RIGHT PARTITION:

    CREATE PARTITION FUNCTION myRangePF2 (int) AS RANGE RIGHT FOR VALUES (1, 100, 1000);

     

    The following table shows how a table that uses this partition function on partitioning column col1 would be partitioned.

    Partition

    1

    2

    3

    4

    Values

    col1 < 1

    col1 >= 1 

    AND col1 < 100

    col1 >= 100 

    AND col1 < 1000

    col1 >= 1000

    RIGHT PARTITION will be useful, when you partition DIMENSION tables where, SWITCH OUT (Archive is not required) is not required.

     

    For example:

    (

    ‘2012-01-01’,

    ‘2012-01-02’,

    ------------------------------

    ------------------------------

    ‘2013-12-31’

    )

     

    You have created 365*2 (To hold 2 Years of data) no of daily Partitions on Dimension table. Suppose you have data more than 2 years in your dimension table then your last partition will have all the data less than 2012-01-01. Now as we want to maintain only 365*2 no of partitions & without archiving, so need to Merge Last 2 partitions & then Split Latest one to hold next day’s data.

     

    Merging Last 2 partitions which are having data:

    Suppose Last Partition P1 is having 1 million records & 2<sup>nd</sup> last partition P2 is having 2, 00,000 records. Now if you fire Merge Statement to Merge P1 & P2 partitions, then with RIGHT Partition data will move from P2 to P1 which will take less time because P2 is having less data compare to P1.

     

    But it will take more time if your partition is LEFT Partitions because it will try to move data from P1 to P2.

     

    If your dimension is SCD (Slowly Changing Dimension) & update is required. So it is better to delete matching records and insert one day latest data instead of updating.

     

    For Example:

    We have update on our table [ODS].[FINE_MASTER] based on columns [SERIAL_NO] &[EMP_ID].

    Suppose 1 million records got loaded in T1_IN table for a day. Out of 1 million 700000 records are old & require update & only 300000 records are new & need to insert in T1 ([ODS].[FINE _MASTER]) table. Instead of updating, I will do INNER JOIN T1_IN table to T1 table based on columns [SERIAL_NO]&[EMP_ID] and then delete matching data from T1 & then do SWITCH IN from T1_IN to T1.

     

    1. ETLs changes are required for all Partition Tables:
      • ETLs need to modify to change destination table from T1 to T1_IN. So ETL will load the Data into T1_IN table (Same Structure as T1).
      • The task which is implemented to delete the data from destination table during re-load of data (Re-Run of ETL) needs to modify. In Delete Script we need to include both the tables T1 & T1_IN. When Re-load (Re-Run of ETL for Old Data) of data required, in that case OUTàMERGEà SPLIT operations are not required, Only SWITCH IN operation is required. This has been handled in below script.
      • Need to include another Execute SQL Task in ETL to handle (automate) SWITCH OUTàMERGEà SPLITàSWITCH IN operations.


    Thanks Shiven:) If Answer is Helpful, Please Vote

    • Marked as answer by Charlie Liao Wednesday, July 2, 2014 9:58 AM
    Wednesday, June 25, 2014 3:26 AM
  • Thanks a lot all! It's really helpful.
    Thursday, June 26, 2014 7:20 PM