help me to interpret and tune up predictions (exceptions)
Hello!
I am prototyping exception detection employing MS Clustering Algoritm DM.
I created the test table [1]
The columns are :
TActnID - transaction ID (PK, int)
AccntType - Boolean (bit), Savings or Checking Account
AccntAmnt - account (well, transaction really) amount
Accnt# - account number, char(9), 'VgvChckng' or 'VgvSvngs'
TAactnDate - transaction date (date)Rows 01-20 and 21-40 are 2 distinct groups with repetitions of the same data, say, 2 very precise patterns.
I created and processed a DM model using MS Clustering algorithm, cf. details [2].
And I execute DMX query over it [3.1] repeating the input data from rows 1-20
Though the input of the
queried account number is 'VgvChckng' the query gives 'VgvSvngs' with somewhat spurrious probability 0.48 for 'VgvSvngs
(taking into account prearranged data I could tell 0.5 without any modeling)!.
But I expected 1.0 for 'VgvChckng' since the queried data coincide exactly with 1-20 rows case
How can I remedy, i.e. receive 'VgvSvngs' but not 'VgvChckng'?
Then, why the predicted date is
1/6/2009 11:59:59 PM ??
How to tune ip up back to pregiven 1/7/2009?
BTW boolean AccntType (False or True meaning Checking or Savings account type, resp.)
are rigidly correlated in presented data (i.e. redundant):
False (account type) <---> 'VgvChckng' (Accnt#)
True (account type) <---> 'VgvSvngs' (Accnt#)
but the boolean type gives 1.0 probability and predicts correctly but Accnt# is predicted wrongly and has predicted probability 0.48.
How the discrete Accnt# should be be tuned up according to prestaged data with probability 1.0?
Now, if to change input account number Accnt# from pre-staged pattern value 'VgvChckng'
(with 50% oe 20 rows of Accnt#)
to nonexistent 'VgvChicken' in DMX query...
One would expect exception, i.e. PredictCaseLikelihood() to be low but the query gives it much higher value than for existing patterns!!!
Cf. results [3.3]
And how to understand the predicted probability of non-existent date value being equal to 1.0?
If to change the queried (predicting) values [3.3, 3.3a]:
- account amount AccntAmnt from 10. to outrageously suspicious 10 000 000.00
and
- the date TAactnDate from "patterned" 2009-01-07 to non-existent antiquated 1950-01-07
the value of PredictCaseLikelihood() is not changed at all in comparison to [31, 3.1a]!
What am I doing wrong and how to tune up predictions?
[1]
-------------------------------------------------------------------------------TActnID AccntType AccntAmnt Accnt# TAactnDate
----------- --------- -------------- --------- ----------
1 0 10.00 VgvChckng 2009-01-07
2 0 10.00 VgvChckng 2009-01-07
3 0 10.00 VgvChckng 2009-01-07
4 0 10.00 VgvChckng 2009-01-07
5 0 10.00 VgvChckng 2009-01-07
6 0 10.00 VgvChckng 2009-01-07
7 0 10.00 VgvChckng 2009-01-07
8 0 10.00 VgvChckng 2009-01-07
9 0 10.00 VgvChckng 2009-01-07
10 0 10.00 VgvChckng 2009-01-07
11 0 10.00 VgvChckng 2009-01-07
12 0 10.00 VgvChckng 2009-01-07
13 0 10.00 VgvChckng 2009-01-07
14 0 10.00 VgvChckng 2009-01-07
15 0 10.00 VgvChckng 2009-01-07
16 0 10.00 VgvChckng 2009-01-07
17 0 10.00 VgvChckng 2009-01-07
18 0 10.00 VgvChckng 2009-01-07
19 0 10.00 VgvChckng 2009-01-07
20 0 10.00 VgvChckng 2009-01-07
1001 1 9990.00 VgvSvngs 1990-01-01
1002 1 9990.00 VgvSvngs 1990-01-01
1003 1 9990.00 VgvSvngs 1990-01-01
1004 1 9990.00 VgvSvngs 1990-01-01
1005 1 9990.00 VgvSvngs 1990-01-01
1006 1 9990.00 VgvSvngs 1990-01-01
1007 1 9990.00 VgvSvngs 1990-01-01
1008 1 9990.00 VgvSvngs 1990-01-01
1009 1 9990.00 VgvSvngs 1990-01-01
1010 1 9990.00 VgvSvngs 1990-01-01
1011 1 9990.00 VgvSvngs 1990-01-01
1012 1 9990.00 VgvSvngs 1990-01-01
1013 1 9990.00 VgvSvngs 1990-01-01
1014 1 9990.00 VgvSvngs 1990-01-01
1015 1 9990.00 VgvSvngs 1990-01-01
1016 1 9990.00 VgvSvngs 1990-01-01
1017 1 9990.00 VgvSvngs 1990-01-01
1018 1 9990.00 VgvSvngs 1990-01-01
1019 1 9990.00 VgvSvngs 1990-01-01
1020 1 9990.00 VgvSvngs 1990-01-01
--------------------------------------------------------------------------------
[2]
Model configuration
All attributes except Key are Input and Predictable
--------------------------------------------------------------------------------
Columns Content Type Data type
--------------------------------------------------------------------------------
Account# Discrete Text
Account Type Discrete Boolean
Accnt Amt Continuus Double
T Aactn Date Continuus Double
T Actn ID Key Long
--------------------------------------------------------------------------------
[3.1]SELECT
PredictCaseLikelihood() As PredCaseLikelihoo
,[Accnt Type] AS AccType -- False True, Predict([Accnt Type]) As TypePred
, PredictProbability([Accnt Type], False) As TypeProb -- discrete
,[Accnt Amnt] As AmntInput--continuous
, Predict([Accnt Amnt]) As AmntPred
, PredictProbability([Accnt Amnt]) As AmntProb
,[Accnt#] As [#Input]-- 'VgvChckng' or 'VgvSvngs'
, Predict([Accnt#]) As [Pred#]
, PredictProbability([Accnt#], 'VgvChckng') As [#Prob] --, PredictProbability([Accnt#], 'VgvChicken') As [#Prob]
,[T Aactn Date] As [Date] --continuous, PredictProbability([T Aactn Date]) As DateProb
, Predict([T Aactn Date]) As DatePred
FROM
[BankTrnsActnsDemoModel]
NATURAL PREDICTION JOIN
(SELECT
False As AccntType,
10.00 AS [AccntAmnt],
'VgvChckng' AS [Accnt#],
-- 'VgvChicken' AS [Accnt#],
'VgvSvngs' AS [Accnt#],
-- '01/01/1900' AS [TAactnDate]
2009-01-07 AS [TAactnDate]
) AS t
[3.1a]
The results of executing [3.1] (query inputs the data of row 1-20)
-----------------------------------------------------------------------------------------------------------------------------
PredCaseLikelihoo AccType TypePred TypeProb AmntInput AmntPred AmntProb
---------------------------------------------------------------------------------------------------------------------------
0.472566570025669 False False 1 10 10 1
----------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
#Input Pred# #Prob Date DateProb DatePred
-----------------------------------------------------------------------------------------------------------------------
VgvSvngs VgvSvngs 0.48 1/6/2009 11:59:59 PM 1 1/6/2009 11:59:59 PM
-----------------------------------------------------------------------------------------------------------------------
[3.2a]
The results of executing DMX query
Same as [3.1] but the 'VgvChckng' is changed to non-existent 'VgvChicken'
Same as Case1 but the input Account Number [Accnt#] is changed from 'VgvChckng' to nonexistent 'VgvChicken'
-----------------------------------------------------------------------------------------------------------------------------
PredCaseLikelihoo AccType TypePred TypeProb AmntInput AmntPred AmntProb
---------------------------------------------------------------------------------------------------------------------------
0.4999900001 False False 1 5199.6 5199.6 1
--------------------------------------------------------------------------------------------------------------------------
#Input Pred# #Prob Date DateProb DatePred
-----------------------------------------------------------------------------------------------------------------------
VgvSvngs VgvSvngs 0. 2/17/1999 1:55:12 AM 1 2/17/1999 1:55:12 AM
-----------------------------------------------------------------------------------------------------------------------
[3.3]
The same as [3.1] but account amount AccntAmnt is queried for nonixistent earlier and outrageously suspicious 100 000 000.00 and for non-existent antiquated transaction date 1950-01-07
But the value determined by PredictCaseLikelihood() is exactly the same as for existing pattern case [3.1, 3.1a]!
SELECTPredictCaseLikelihood() As PredCaseLikelihoo
,[Accnt Type] AS AccType -- False True
, Predict([Accnt Type]) As TypePred
, PredictProbability([Accnt Type], False) As TypeProb -- discrete
,[Accnt Amnt] As AmntInput--continuous
, Predict([Accnt Amnt]) As AmntPred
, PredictProbability([Accnt Amnt]) As AmntProb
,[Accnt#] As [#Input]-- 'VgvChckng' or 'VgvSvngs'
, Predict([Accnt#]) As [Pred#]
, PredictProbability([Accnt#], 'VgvChckng') As [#Prob] --Discrete
,[T Aactn Date] As [Date] --continuous
, PredictProbability([T Aactn Date]) As DateProb
, Predict([T Aactn Date]) As DatePred
FROM
[BankTrnsActnsDemoModel]
NATURAL PREDICTION JOIN
(SELECT
False As AccntType,
100000000.00 AS [AccntAmnt],
'VgvChckng' AS [Accnt#],
'1950-01-07' AS [TAactnDate]
) AS t
[3.3a]
----------------------------------------------------------------------------PredCaseLikelihoo AccType TypePred TypeProb AmntInput AmntPred
----------------------------------------------------------------------------
0.472566570025669 False False 1 10 10
----------------------------------------------------------------------------
AmntProb #Input Pred# #Prob Date DateProb DatePred
----------------------------------------------------------------------------
1 VgvSvngs VgvSvngs 0.48 1/6/2009 11:59:59 PM 1 1/6/2009 11:59:59 PM----------------------------------------------------------------------------
Guennadi Vanine -- Gennady Vanin -- Геннадий Ванин- 已编辑Guennadiy Vanine 2009年7月3日 3:58
- 已编辑Guennadiy Vanine 2009年7月3日 4:16
- 已编辑Guennadiy Vanine 2009年7月3日 3:38

