Data Mining Optimum Datasets RRS feed

  • Question

  • Hi I'm interested in data mining. However I'd like to how how big the datasets need to be to be meaningful. Say I have an ecommerce system, how many transactions need to be in the database before I can segment the users for an emarketing campaign or predict "if a user buys this that they are likely to buy that". I appreciate it's probably not an exact science. Cheers, Chris.


    • Moved by Darren GosbellMVP Monday, June 18, 2012 11:20 PM This is a data mining question (From:SQL Server Analysis Services)
    Monday, June 18, 2012 7:46 PM


  • Amount of data needed depends on the

    1. algorithm you are trying to use

    2. data inself (how much noise it have etc)

    For forecasting algorithm it is recommended that you have at least 3 times more data than what you are trying to predict. So, to predict 1 year you need data for 3 years minimum.

    For classification and estimation models, amount of data you need depends on many factors (number of columns in your input, number of states for the column you are trying to predict, the data itself). Usually people create model on the data they have and then check the model accuracy on the test set and validation set. If accuracy on validation and test set is the same, adding more rows of data is not going to change model accuracy much.

    There is no specific limitation on data for segmentation.

    Same with the association (shopping basket analysis), you can use any amount of data you have.

    Tatyana Yakushev [PredixionSoftware.com]

    Tuesday, June 19, 2012 12:46 AM