Sunday, February 05, 2012 2:09 PM
I'm totally new at data mining, so I just started some analysis of existing Data sets, namely the Wisconsin Breast Cancer Dataset (http://www.sqlserverdatamining.com/ssdm/Home/Downloads/tabid/60/Default.aspx ). The Dataset contains attributes of tumors like radius, texture Area etc. and the predictable attribute is Diagnosis (benign/malignant).
So anyway my question concerns the neural network viewer, the viewer tells me for example that the AV pair Worst Radius 19 -30 has a score of 80, a probability1 of 94, and a Lift Value1 of 1,61. I do understand that a high score and the high probability make this AV pair a strong indicator if the tumor is good or bad, but I don't really know what score and lift means and how the score, prob., and lift are calculated. It would be really great if anyone could help me out with these problems.
Many thanks in advance,
Tuesday, February 07, 2012 1:46 PM
calculating the probability and all depends on the intelligent algorithms working in the backend.Something like the comparision algorithms are used.
for better understanding of these concepts go through channel9 videos by peter myers.
Thursday, February 09, 2012 11:35 PMModerator
The neural network (and logistic regression) viewers are basically offerring a perspective over the odds ratios associated with different inputs, a measure of the impact of verious input attribute/values on the predicted probability of certain output states. In your example, the viewer computes computes the probability of 'benign' (PB) and of 'malignant' (PM) as yielded by an input containing
-a value of, say, 25 (middle of the bucket) for the [worst radius] attribute
- no value (Missing) for the same attribute
A DMX query to produce these results is below:
SELECT PredictProbability(Diagnosis, 'benign') AS PB, PredictProbability(Diagnosis, 'malignant') AS PM
FROM <Model> NATURAL PREDICTION JOIN
SELECT 25 AS [Worst Radius] UNION
SELECT NULL AS [Worst Radius]) AS T
Such a query returns, PB and PM for each of the 2 inputs, therefore PB1 (and PM1) conditioned by a specific worst radius value amd PB2 (and PM2) without taking into account any value of [Worst Radius] (NULL or Missing).
Now, for the viewer line associated with [WorstRadius] in (19-30), Probability of Value 1 (Benign) will be PB1 (therefore the prob of 'benign' conditioned by the input) and Probability of Value 2 (Malignant) is PM1 (therefore the prob of 'malignant' conditioned by the input)
The lift is the ratio between the probability conditioned by the input and the probability assuming the input is not known (therefore, PB1/PB2, for Value 1, and PM1/PM2, for value 2).
Hope this helps
bogdan crivat / http://www.bogdancrivat.net/dm