none
How to generate aggregated value as attribute? RRS feed

  • Question

  • Hi Experts,

    This is my Customer dimention:ID  FirstName lastName .......

    Now, I want to create a new attribute: SalesRange. The value should be: 0-1000, 1000-2000, 2000+. For example, if the sales value of CustomerA is 500, then his SalesRange should be 0-1000. The sample could be:

    ID  Name  SalesRange

    1    A      2000+

    2    B      0-1000

    3    C      0-1000

    4            ….

    How to populate this dimension? I tried to use MergeJoin and , but the value in FactTable is not available (Process Dim and then Fact). So  I'm curious to konw how did you achieve such requirement?

    Saturday, December 18, 2010 12:36 PM

Answers

  • Well, supposing that Customer A already has a row in the DimCust table, and already has a TotalSales field value of $5,000. After the next incremental FACT table load, have a T-SQL statement like the following:

    UPDATE dc

    set dc.TotalSales = dc.TotalSales + f.AdditionalSales

    FROM dbo.DimCust AS dc

    INNER JOIN (SELECT CustomerID, SUM(Sales) AS AdditionalSales FROM dbo.FactSales

    WHERE Load_Date >= {Some date range here representing the most recent incremental load}

    GROUP BY CustomerID) AS f

    ON dc.CustomerID = f.CustomerID

     

    Now, if you don't have a Load_Date field in your FACT table, or cannot discern via any other method what the most recently loaded rows are, then you may need to do something in the SSIS Package itself. To do that, put in a Multi-Cast that splits off the FACT rows, then send them to an Aggregate Transform that Groups by CustomerID and gets the SUM of Sales. If you anticipate a small number of rows, then you could send that to an OLE DB SQL Command transform that does the update statement one row at a time. If you anticipate a large number of rows, then send it to a staging table (that gets truncated first) and do the T-SQL UPDATE statement above, but source the AdditionalSales from the staging table instead of the FACT table.

    Hope all this makes sense.


    Todd C - MSCTS SQL Server 2005 - Please mark posts as answered where appropriate.
    Sunday, December 19, 2010 1:43 PM

All replies

  • Since you can't know a Customer's total until AFTER the FACT talble has been populated, and since the Sales figure may change for a Customer after it does, you may need to edit your Data Source View in the Analysis Server Project (we assume this is for a cube of some sort?).

    Replace the DIM Customer table with a query, and for that query do a LEFT OUTER JOIN to the FACT Sales table. I'm thinking something like this:

    WITH CustomerSales AS (SELECT CustomerID, SUM(Sales) AS Total_Sales FROM dbo.FACT_Sales  GROUP BY CustomerID)

    SELECT <Fields from DIM_Customer>, CustomerSales.Total_Sales

    FROM DIM_Customer LEFT OUTER JOIN CustomerSales ON CustomerID = CustomerID

    Now that only gives you the Total_Sales figure, but does not give you the 'bucketing' that you desire. For that, you could either replace the query above and put in an elaborate CASE statement to inspect the Total_Sales figure, or you could let Analysis Server do the bucketing for you based on the spread. If you need help with that, go to the Analysis Server Forum.


    Todd C - MSCTS SQL Server 2005 - Please mark posts as answered where appropriate.
    Saturday, December 18, 2010 1:57 PM
  • Thanks for your response. Yes, I also have a SSAS cube. But now, I just want to populate this dimention in DW. Suppose, only one row be insterted into DimCus, and then I need to update the entire table (just for one row) to generate SalesRange after loading. And the dimCus is a huge table (more than 20 million). So how can I use my current incremental ETL to load this table with better performance? Thanks.

    Sunday, December 19, 2010 12:21 PM
  • Well, supposing that Customer A already has a row in the DimCust table, and already has a TotalSales field value of $5,000. After the next incremental FACT table load, have a T-SQL statement like the following:

    UPDATE dc

    set dc.TotalSales = dc.TotalSales + f.AdditionalSales

    FROM dbo.DimCust AS dc

    INNER JOIN (SELECT CustomerID, SUM(Sales) AS AdditionalSales FROM dbo.FactSales

    WHERE Load_Date >= {Some date range here representing the most recent incremental load}

    GROUP BY CustomerID) AS f

    ON dc.CustomerID = f.CustomerID

     

    Now, if you don't have a Load_Date field in your FACT table, or cannot discern via any other method what the most recently loaded rows are, then you may need to do something in the SSIS Package itself. To do that, put in a Multi-Cast that splits off the FACT rows, then send them to an Aggregate Transform that Groups by CustomerID and gets the SUM of Sales. If you anticipate a small number of rows, then you could send that to an OLE DB SQL Command transform that does the update statement one row at a time. If you anticipate a large number of rows, then send it to a staging table (that gets truncated first) and do the T-SQL UPDATE statement above, but source the AdditionalSales from the staging table instead of the FACT table.

    Hope all this makes sense.


    Todd C - MSCTS SQL Server 2005 - Please mark posts as answered where appropriate.
    Sunday, December 19, 2010 1:43 PM