Задайте вопросЗадайте вопрос
 

Вопросhelp me to interpret and tune up predictions (exceptions)

  • 3 июля 2009 г. 1:53Guennadiy Vanine Медали пользователяМедали пользователяМедали пользователяМедали пользователяМедали пользователя
     

    Hello!
    I am prototyping exception detection employing MS Clustering Algoritm DM.
    I created the test table [1] 

    The columns are :
    TActnID - transaction ID (PK, int) 
    AccntType - Boolean (bit), Savings or Checking Account 
    AccntAmnt - account (well, transaction really) amount
    Accnt# - account number, char(9), 'VgvChckng' or 'VgvSvngs' 
    TAactnDate - transaction date (date)

    Rows 01-20 and 21-40 are 2 distinct groups with repetitions of the same data, say, 2 very precise patterns.

    I created and processed a DM model using MS Clustering algorithm, cf. details [2].

    And I execute DMX query over it [3.1] repeating the input data from rows 1-20

    Though the input of the
    queried account number is 'VgvChckng' the query gives 'VgvSvngs' with somewhat spurrious probability 0.48 for 'VgvSvngs
    (taking into account  prearranged data I could tell 0.5 without any modeling)!.

    But I expected 1.0 for 'VgvChckng' since the queried data coincide exactly with 1-20 rows case
    How can I remedy, i.e. receive 'VgvSvngs' but not 'VgvChckng'?


    Then,  why  the predicted date is
    1/6/2009 11:59:59 PM ??
    How to tune ip up back to pregiven 1/7/2009? 

    BTW boolean AccntType (False or True meaning Checking or Savings account type, resp.)
    are rigidly correlated in presented data (i.e. redundant):
    False  (account type) <---> 'VgvChckng' (Accnt#)
    True   (account type) <---> 'VgvSvngs' (Accnt#)
    but the boolean type gives 1.0 probability and predicts correctly but Accnt# is predicted wrongly and has predicted probability 0.48.
    How the discrete Accnt# should be be tuned up according to prestaged data with probability 1.0?

    Now, if to change  input account number Accnt# from pre-staged pattern value 'VgvChckng'
    (with 50% oe 20 rows of Accnt#)
    to nonexistent 'VgvChicken' in DMX query...
    One would expect exception, i.e. PredictCaseLikelihood() to be low but the query gives it much higher value than for existing patterns!!!
    Cf. results [3.3]

    And how to understand the predicted probability of non-existent date value being equal to 1.0?

    If to change the queried (predicting)  values [3.3, 3.3a]:
    - account amount AccntAmnt  from 10. to outrageously suspicious 10 000 000.00
    and
    - the date TAactnDate from "patterned" 2009-01-07 to non-existent antiquated 1950-01-07
    the value of PredictCaseLikelihood() is not changed at all in comparison to [31, 3.1a]!

    What am I doing wrong and how to tune up predictions?  

    [1]
    -------------------------------------------------------------------------------

    TActnID     AccntType AccntAmnt      Accnt#    TAactnDate

    ----------- --------- -------------- --------- ----------

    1           0         10.00          VgvChckng 2009-01-07

    2           0         10.00          VgvChckng 2009-01-07

    3           0         10.00          VgvChckng 2009-01-07

    4           0         10.00          VgvChckng 2009-01-07

    5           0         10.00          VgvChckng 2009-01-07

    6           0         10.00          VgvChckng 2009-01-07

    7           0         10.00          VgvChckng 2009-01-07

    8           0         10.00          VgvChckng 2009-01-07

    9           0         10.00          VgvChckng 2009-01-07

    10          0         10.00          VgvChckng 2009-01-07

    11          0         10.00          VgvChckng 2009-01-07

    12          0         10.00          VgvChckng 2009-01-07

    13          0         10.00          VgvChckng 2009-01-07

    14          0         10.00          VgvChckng 2009-01-07

    15          0         10.00          VgvChckng 2009-01-07

    16          0         10.00          VgvChckng 2009-01-07

    17          0         10.00          VgvChckng 2009-01-07

    18          0         10.00          VgvChckng 2009-01-07

    19          0         10.00          VgvChckng 2009-01-07

    20          0         10.00          VgvChckng 2009-01-07

    1001        1         9990.00        VgvSvngs  1990-01-01

    1002        1         9990.00        VgvSvngs  1990-01-01

    1003        1         9990.00        VgvSvngs  1990-01-01

    1004        1         9990.00        VgvSvngs  1990-01-01

    1005        1         9990.00        VgvSvngs  1990-01-01

    1006        1         9990.00        VgvSvngs  1990-01-01

    1007        1         9990.00        VgvSvngs  1990-01-01

    1008        1         9990.00        VgvSvngs  1990-01-01

    1009        1         9990.00        VgvSvngs  1990-01-01

    1010        1         9990.00        VgvSvngs  1990-01-01

    1011        1         9990.00        VgvSvngs  1990-01-01

    1012        1         9990.00        VgvSvngs  1990-01-01

    1013        1         9990.00        VgvSvngs  1990-01-01

    1014        1         9990.00        VgvSvngs  1990-01-01

    1015        1         9990.00        VgvSvngs  1990-01-01

    1016        1         9990.00        VgvSvngs  1990-01-01

    1017        1         9990.00        VgvSvngs  1990-01-01

    1018        1         9990.00        VgvSvngs  1990-01-01

    1019        1         9990.00        VgvSvngs  1990-01-01

    1020        1         9990.00        VgvSvngs  1990-01-01
    --------------------------------------------------------------------------------
    [2]
    Model configuration
    All attributes except Key are Input and Predictable

    --------------------------------------------------------------------------------
    Columns            Content Type            Data type
    --------------------------------------------------------------------------------
    Account#            Discrete                     Text
    Account Type      Discrete                     Boolean 
    Accnt Amt             Continuus                Double
    T Aactn Date             Continuus            Double
    T Actn ID             Key                          Long
    --------------------------------------------------------------------------------

    [3.1] 

    SELECT

     PredictCaseLikelihood() As PredCaseLikelihoo


    ,[Accnt Type] AS AccType -- False True

    , Predict([Accnt Type]) As TypePred

    , PredictProbability([Accnt Type], False) As TypeProb -- discrete

     

    ,[Accnt Amnt] As AmntInput--continuous

    , Predict([Accnt Amnt]) As AmntPred

    , PredictProbability([Accnt Amnt]) As AmntProb

     

    ,[Accnt#] As [#Input]-- 'VgvChckng' or 'VgvSvngs' 

    , Predict([Accnt#]) As [Pred#]

    , PredictProbability([Accnt#], 'VgvChckng') As [#Prob] --, PredictProbability([Accnt#], 'VgvChicken') As [#Prob] 


    ,[T Aactn Date] As [Date] --continuous

    , PredictProbability([T Aactn Date]) As DateProb

    , Predict([T Aactn Date]) As DatePred

     

    FROM

       [BankTrnsActnsDemoModel]

    NATURAL PREDICTION JOIN

    (SELECT

        False As AccntType,

        10.00 AS [AccntAmnt],

        'VgvChckng' AS [Accnt#],

    --  'VgvChicken' AS [Accnt#],

         'VgvSvngs' AS [Accnt#],

    --   '01/01/1900' AS [TAactnDate]

          2009-01-07 AS [TAactnDate]

       ) AS t


    [3.1a]
    The results of executing [3.1] (query inputs the data of row 1-20)
    -----------------------------------------------------------------------------------------------------------------------------
    PredCaseLikelihoo    AccType          TypePred         TypeProb        AmntInput  AmntPred  AmntProb 
    ---------------------------------------------------------------------------------------------------------------------------
    0.472566570025669  False                 False                  1                    10               10            1              
    ----------------------------------------------------------------------------------------------------------------------------


    -----------------------------------------------------------------------------------------------------------------------------
    #Input        Pred#        #Prob     Date                           DateProb         DatePred
    -----------------------------------------------------------------------------------------------------------------------
    VgvSvngs  VgvSvngs   0.48        1/6/2009 11:59:59 PM   1                     1/6/2009 11:59:59 PM
    -----------------------------------------------------------------------------------------------------------------------


    [3.2a]
    The results of executing DMX query
    Same as [3.1]  but the 'VgvChckng' is changed to non-existent 'VgvChicken'
    Same as Case1 but the input Account Number [Accnt#] is changed from 'VgvChckng' to nonexistent 'VgvChicken'
    -----------------------------------------------------------------------------------------------------------------------------
    PredCaseLikelihoo    AccType          TypePred         TypeProb        AmntInput  AmntPred  AmntProb 
    ---------------------------------------------------------------------------------------------------------------------------
    0.4999900001          False                   False                  1                   5199.6       5199.6      1  
    --------------------------------------------------------------------------------------------------------------------------

    #Input        Pred#        #Prob     Date                           DateProb         DatePred
    -----------------------------------------------------------------------------------------------------------------------
     VgvSvngs   VgvSvngs    0.         2/17/1999 1:55:12 AM  1                    2/17/1999 1:55:12 AM
    -----------------------------------------------------------------------------------------------------------------------

    [3.3]
    The same as [3.1] but account amount AccntAmnt is queried for nonixistent earlier and outrageously suspicious 100 000 000.00 and for non-existent antiquated transaction date 1950-01-07

    But the value determined by PredictCaseLikelihood() is exactly the same as for existing pattern case [3.1, 3.1a]!


    SELECT

     PredictCaseLikelihood() As PredCaseLikelihoo

    ,[Accnt Type] AS AccType -- False True

    , Predict([Accnt Type]) As TypePred

    , PredictProbability([Accnt Type], False) As TypeProb -- discrete

     

    ,[Accnt Amnt] As AmntInput--continuous

    , Predict([Accnt Amnt]) As AmntPred

    , PredictProbability([Accnt Amnt]) As AmntProb

     

    ,[Accnt#] As [#Input]-- 'VgvChckng' or 'VgvSvngs' 

    , Predict([Accnt#]) As [Pred#]

     , PredictProbability([Accnt#], 'VgvChckng') As [#Prob] --Discrete

     

    ,[T Aactn Date] As [Date] --continuous

    , PredictProbability([T Aactn Date]) As DateProb

    , Predict([T Aactn Date]) As DatePred

     

    FROM

       [BankTrnsActnsDemoModel]

    NATURAL PREDICTION JOIN

    (SELECT

        False As AccntType,

        100000000.00 AS [AccntAmnt],

        'VgvChckng' AS [Accnt#],

        '1950-01-07' AS [TAactnDate]

       ) AS t

    [3.3a]
    ---------------------------------------------------------------------------- 

    PredCaseLikelihoo      AccType          TypePred         TypeProb        AmntInput        AmntPred

    ----------------------------------------------------------------------------
    0.472566570025669  False                 False                1                      10                    10                         

    ----------------------------------------------------------------------------
    AmntProb        #Input             Pred#              #Prob   Date                               DateProb         DatePred
    ----------------------------------------------------------------------------
    1                      VgvSvngs        VgvSvngs        0.48     1/6/2009 11:59:59 PM   1                     1/6/2009 11:59:59 PM

    ----------------------------------------------------------------------------
     

     


    Guennadi Vanine -- Gennady Vanin -- Геннадий Ванин