Best Approach to write the query

Answered Best Approach to write the query

  • Wednesday, September 12, 2012 1:01 PM
     
      Has Code

    In a given month each member is considered a certain lifecycle type. This will change on monthly basis based on their previous & current month's activity, i.e.

    New = New P2P in the calendar month

    Existing i.e. P2P in a prior calendar month can be either:

    Retained = Active in the prior calendar month and the current calendar month

    Unretained = Active in the prior calendar month, but not active in the current calendar month

    Reactivated = No activity in the prior calendar month, but active in the current calendar month

    Lapsed = No activity in the prior calendar month or the current calendar month

    P2P stands for “Pay to Play”, i.e. the Member has placed at least one real money wager since joining.

    Active means the member has made a minimum of one real money wager in the month.

    Source Tables

    The “REVENUE_ANALYSIS” table provides a summary of each member’s activity on a given date, listing the total amounts wagered and won for each game played that day. If a member does not play on a given date there will be no entries in the table. You can assume this is a very large table ~100M rows.

    There is also a CALENDAR table that provides a base “date” dimension, one row per day from 2000 to 2020.

    REVENUE_ANALYSIS

    ACTIVITY_DATE DATE NOT NULL,  Date wager was made

    MEMBER_ID INTEGER NOT NULL,  Unique Player identifier

    GAME_ID SMALLINT NOT NULL,  Unique Game identifier

    WAGER_AMOUNT REAL NOT NULL,  Total amount wagered on the game

    NUMBER_OF_WAGERS INTEGER NOT NULL,  Number of wagers on the game

    WIN_AMOUNT REAL NOT NULL,  Total amount won on the game

    ACTIVITY_YEAR_MONTH INTEGER NOT NULL, YYYYMM

    BANK_TYPE_ID SMALLINT DEFAULT 0 NOT NULL  0=Real money, 1=Bonus money

    CALENDAR

    CALENDAR_DATE DATE NOT NULL, Base date (YYYY-MM-DD)

    CALENDAR_YEAR INTEGER NOT NULL, 2010, 2011 etc

    CALENDAR_MONTH_NUMBER INTEGER NOT NULL, 1-12

    CALENDAR_MONTH_NAME VARCHAR(100), January, February etc

    CALENDAR_DAY_OF_MONTH INTEGER NOT NULL, 1-31

    Friday, 25 February 2011 In Confidence Page 2 of 2

    CALENDAR_DAY_OF_WEEK INTEGER NOT NULL, 1-7

    CALENDAR_DAY_NAME VARCHAR(100), Monday, Tuesday etc

    CALENDAR_YEAR_MONTH INTEGER NOT NULL, 201011, 201012, 201101 etc

    The required solution is a view with the following columns:

    MEMBER_ID,

    CALENDAR_YEAR_MONTH,

    MEMBER_LIFECYCLE_STATUS,

    LAPSED_MONTHS

    Listing one row per Member per month, starting from the month in which they first place a real money wager, giving their lifecycle status for that month, also if the member has lapsed we need a rolling count of the number of months since they were last active.

    I think the case statement should be fine but would like to know the thoughts from other people.


    Mustafa

All Replies

  • Wednesday, September 12, 2012 2:26 PM
     
     
    The best way would be to learn SQL and at least attempt to write the query yourself.
  • Wednesday, September 12, 2012 2:29 PM
     
     
    I know the query. just wanted to know what is the best approach? 

    MH

  • Wednesday, September 12, 2012 2:30 PM
    Moderator
     
     
    Then please give us at least what you have so far.  Also, you might want to provide some sample data and desired output.
  • Wednesday, September 12, 2012 2:37 PM
     
     

    Hi Kent,

    Thanks for your reply. I am working on it and put my query shortly.


    MH

  • Thursday, September 13, 2012 9:17 PM
     
      Has Code

    Hi All,

    Desired output

    Member_ID, Calendar_year_month, Member_Life_Cycle_Status, Lapsed_Months

    Business Rules For Member_life_cycle_status is

    New = New P2P in the calendar month

    Retained = Active in the prior calendar month and the current calendar month

    Unretained = Active in the prior calendar month, but not active in the current calendar month

    Reactivated = No activity in the prior calendar month, but active in the current calendar month

    Lapsed = No activity in the prior calendar month or the current calendar month

    <o:p></o:p>

    I have this Query 1

    Select DISTINCT 
    	 A.activity_date as current_activity
    	 ,a.member_id
    	 ,a.activity_year_month as current_month
    	,B.PREVIOUS_MONTH
    INTO #FINAL_SAMPLE
    From dbo.Revenue_Analysis AS A
    LEFT JOIN (SELECT DISTINCT 
    				  MEMBER_ID
    				 ,ACTIVITY_DATE AS PREVIOUS_ACTIVITY 
    				 ,ACTIVITY_YEAR_MONTH AS PREVIOUS_MONTH	
    		   FROM dbo.Revenue_Analysis) AS B
    ON A.MEMBER_ID = B.MEMBER_ID
    AND ((YEAR(A.ACTIVITY_DATE) - YEAR(GETDATE())) * 12)  + MONTH(A.ACTIVITY_DATE) - MONTH(GETDATE())
    = ((YEAR(B.PREVIOUS_ACTIVITY) - YEAR(GETDATE())) * 12) + (MONTH(B.PREVIOUS_ACTIVITY) - MONTH(GETDATE())) + 1
    where A.activity_date >= '2012-01-01'


    Query 2

    SELECT 
    	 MEMBER_ID
    	,current_month as ACTIVITY_YEAR_MONTH
    	,CASE WHEN PREVIOUS_MONTH IS NULL THEN 'New'
    		  WHEN PREVIOUS_MONTH IS NOT NULL or current_month is not null THEN 'Retained'	
    		  else 'Other'end  as MEMBER_LIFECYCLE_STATUS
    	,cast(null as int) as LAPSED_MONTHS	
    FROM #FINAL_SAMPLE

    I am struggling to get conditions for otherlife_cycle_status(Unretained,Reactivated,Lapsedand lapsed_months  Any suggestions would be much appreciated??

    Many Thanks.

    I have date dimension and revenue analysis sample data in excel sheet for year 2012. For sample data please click here Sample Data


    MH




    • Edited by MustafaH Thursday, September 13, 2012 9:38 PM
    •  
  • Thursday, September 13, 2012 9:21 PM
     
      Has Code

    Script For creating calendar table

    CREATE TABLE #CALENDAR
    (
    	CALENDAR_DATE DATEtime NOT NULL, --Base date (YYYY-MM-DD)
    	CALENDAR_YEAR INT NOT NULL, --2010, 2011 etc
    	CALENDAR_MONTH_NUMBER INT NOT NULL, --1-12
    	CALENDAR_MONTH_NAME VARCHAR(100), --January, February etc
    	CALENDAR_DAY_OF_MONTH INT NOT NULL, --1-31
    	CALENDAR_DAY_OF_WEEK INT NOT NULL, --1-7
    	CALENDAR_DAY_NAME VARCHAR(100), --Monday, Tuesday etc
    	CALENDAR_YEAR_MONTH VARCHAR(20) NOT NULL, --201011, 201012, 201101 etc
    )

    Revenus_Analysis table script

    CREATE TABLE #REVENUE_ANALYSIS
    (
    	ACTIVITY_DATE DATETIME NOT NULL,  --Date wager was made
    	MEMBER_ID INT  NOT NULL,  --Unique Player identifier
    	GAME_ID INT  NOT NULL,  --Unique Game identifier
    	WAGER_AMOUNT FLOAT  NULL,  --Total amount wagered on the game
    	NUMBER_OF_WAGERS INT  NULL,  --Number of wagers on the game
    	WIN_AMOUNT FLOAT  NULL,  --Total amount won on the game
    	ACTIVITY_YEAR_MONTH VARCHAR(20) NOT NULL, --YYYYMM
    	BANK_TYPE_ID SMALLINT DEFAULT 0 NOT NULL   --0=Real money, 1=Bonus money
    )


    MH

  • Tuesday, September 25, 2012 8:19 AM
     
     

    Hi All, 

    I have posted it the way i think. is there any best way to do it?

    Thanks in advance.


    MH

  • Wednesday, September 26, 2012 2:57 AM
     
     Answered

    Your schema is wrong. No keys, improper data types and no constraints. Did you know that using FLOAT for money is illegal and you have to use a fixed number of decimal places by law? You might want to learn the ISO-11179 rules for data element names. All of those data parts can be done with a computed column. How do I wager NULL dollars? How do I win NULL dollars? There is no such thing as a “type_id” in RDBMS; there is a “<something>_id” or a “<something>_type” but not a hybrid. 

    CREATE TABLE Revenue_Analysis
    (activity_date DATE NOT NULL, 
     member_id INTEGER NOT NULL
      REFERENCES Players(player_id),
     game_id INTEGER NOT NULL
      REFERENCES Games(game_id),

    PRIMARY KEY (activity_date, member_id ,  game_id),

    wager_amt DECIMAL(18,5) NOT NULL,
     wager_cnt INTEGER NOT NULL
       CHECK (wager_cnt > 0), 
     win_amt DECIMAL(18,5)  NOT NULL
       CHECK win_amt >= 0.00), 
     bank_type SMALLINT DEFAULT 0 NOT NULL 
       CHECK (bank_type IN (0,1));

    A useful idiom is a report period calendar. It gives a name to a range of dates.

    CREATE TABLE Report_Periods
    (report_name VARCHAR(30) NOT NULL PRIMARY KEY,
     report_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
     report_end_date DATE NOT NULL,
     CONSTRAINT Date_Ordering
      CHECK (report_start_date <= report_end_date),
    etc);

    These report periods can overlap or have gaps. I like the MySQL convention of using double zeroes for months and years, That is 'yyyy-mm-00' for a month within a year and 'yyyy-00-00' for the whole year. The advantage is that it will sort with the ISO-8601 data format required by Standard SQL.

    Then you need to read http://www.simple-talk.com/sql/t-sql-programming/state-transition-constraints/ to get valid data about the behavior of the players. 

    Since you are not paying me, I will you do the rest of this :) Think look up tables and not temporal math.  Much faster and safer. 


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

  • Wednesday, September 26, 2012 9:09 AM
     
     

    Thank you so much CELKO. I will try this again the approach you have mentioned above and update the results here. Can I ask question if i find any difficulty? 

    Many thanks for the help. Did you look sample data? Is that ok to practice this exercise?


    MH