Answered by:
Datamining with small valid sample, please help!

I have a dataset around 10K, they all have detailed info, but only 400 of them have response. I want to predict the behavier of the rest of the data (10k400) based on the 400 records that has input, what kind of model shall I use? How to do the data cleansing? Will the result be very not reliable when the valid record used for training is only 4%? Really appreciate any response! I've struggled for a week and still couldn't figure the problem out.
Question
Answers

What are you trying to predict? Is it discrete column (e.g. Yes or No)? If not, what is it?
If it is discrete, then try following algorithms: decision trees, logistic regression, neural networks.
It is impossible to say if results will be reliable or not without knowing your data and trying to create mining models.
I would train multiple models on 70% of the 400 rows and measure accuracy on the remaining 30% of the 400 rows.
Tatyana Yakushev [PredixionSoftware.com] Marked as answer by Tatyana YakushevEditor Tuesday, January 31, 2012 12:03 AM
All replies

What are you trying to predict? Is it discrete column (e.g. Yes or No)? If not, what is it?
If it is discrete, then try following algorithms: decision trees, logistic regression, neural networks.
It is impossible to say if results will be reliable or not without knowing your data and trying to create mining models.
I would train multiple models on 70% of the 400 rows and measure accuracy on the remaining 30% of the 400 rows.
Tatyana Yakushev [PredixionSoftware.com] Marked as answer by Tatyana YakushevEditor Tuesday, January 31, 2012 12:03 AM

Thank you so much for your input! It is discrete prediction column. What I did was put all the 400 records (Y) and some of the nonresponse records from the other 10k (N) into the model (training and testing). In this way some extra info for the N type will be discovered. But it seems I shouldn't grab the 'N' records there since I want to predict them base on the model. I'll post the result with your method here later. Thanks again!