locked
New to the Data Mining business RRS feed

  • Question

  • Hello,

    I am new to this data-mining business.
    I would like for someone to point me in the right direction, plese.

    In my business - I will have a database with several tables (let's assume MS SQL Server).

    For the sake of example, I have a 'person' table and an 'purchases' table (each person can make several different purchases).

    Here is an example of the 'person' table:

    - ID, First Name, Last Name, Address, Date of Birth

    Here is an example of the 'purchase' table:

    - Date of Purchase, Time of Purchase, Store Location, Number of Items Purchased, Total Purchase Amount

     

    What I would like to do is:
    I would like to perform 'data-mining' in order to find out the "categories", or "groups" of charecaristics, that define each 'purchase amount".

    For example - I would like to know what is common for all the people who bought between 2,000$ and 2,500$ ?

    Was it that they all came from the same neighbourhood ?

    Maybe all of them were young & came from the same neighbourhood?

    Maybe there are 2 distinct different groups that fit the '2000$-2500$' spending group ?

     

    From what I understand, this falls under the 'data-mining' region.

    Is this true ? and if so, what are the tools I must use in order to find out these set of parameters that define the different groups for each pricing category?

     

    Thank you very much !

    John

    Friday, December 9, 2011 3:28 PM

Answers

  • To get you started;

    You can create a Mining model using Decision Tree algorithm using SSAS BIDS. Your case table will be Person's and nested table will be Purchase.

    -----

    THIS IS THE LIST OF 'GROUPED COLUMNS' WE FOUND THAT RESULT IN SPENDING OF 2,000-2,500 DOLLARS:

         - All the males in the age of 20-22, who purchase in the morning

         - All the people (male & female) in the age of 65-70, who purchase in the night more than 20 items

         - All the females that purchase more than 41 items

    After you create this model, you can use DMX content queries to examine your model to get the answers you're looking for. Furthermore, you can use DMX prediction query to make predictions.

    http://msdn.microsoft.com/en-us/library/cc645868.aspx

    Please let us know how it goes, and post back if you have any questions.

    hth,

    Rok

     


    please remember to mark as answered if the post helped resolve the issue.
    • Proposed as answer by Bogdan Crivat Tuesday, December 13, 2011 8:25 AM
    • Marked as answer by jsminer Tuesday, December 13, 2011 8:36 AM
    Monday, December 12, 2011 5:17 PM
  • I agree with rok1. You can get this information by creating decision tree model. If you are using Excel add-in, this is done by clicking "Classify" button in the "data mining" ribbon.

    You don't have to send DMX content queries to examine the model. You can see all of this information in the data mining model viewer. Click "Browse" button in the add-in.


    Tatyana Yakushev [PredixionSoftware.com]
    • Marked as answer by jsminer Tuesday, December 13, 2011 8:36 AM
    Monday, December 12, 2011 6:23 PM
    Answerer

All replies

  • Hi John,

    this is a common data mining problem.

    Microsoft data mining tools are available in BI Dev Studio and as an Excel add-in.

    Excel add-in is a lot easier to use, so I recommend starting with Excel.

    First step is to prepare your data. I would use all fields from customer table and add calculated column with max purchase amount. After that I would run "Analyse Key Influencers" task to see what are the most important characteristics of customers with different spending limits.


    Tatyana Yakushev [PredixionSoftware.com]
    • Proposed as answer by Horizon_Net Saturday, December 10, 2011 4:45 PM
    Friday, December 9, 2011 8:14 PM
    Answerer
  • Hi John,

    this is a common data mining problem.

    Microsoft data mining tools are available in BI Dev Studio and as an Excel add-in.

    Excel add-in is a lot easier to use, so I recommend starting with Excel.

    First step is to prepare your data. I would use all fields from customer table and add calculated column with max purchase amount. After that I would run "Analyse Key Influencers" task to see what are the most important characteristics of customers with different spending limits.


    Tatyana Yakushev [PredixionSoftware.com]

    Hi,

    I have read what you have written.

    'Analyse Key Influencers' does seem very interesting, but it appears to give me each influencer by itself,

    meaning - how much each 'field' influences the result.

    What I am looking for - is for the different 'category of influencers' that together influence a certain outcome in the 'spending' column.

    Meaning - I would like the results to be something like:

     

         THIS IS THE LIST OF 'GROUPED COLUMNS' WE FOUND THAT RESULT IN SPENDING OF 2,000-2,500 DOLLARS:

         - All the males in the age of 20-22, who purchase in the morning

         - All the people (male & female) in the age of 65-70, who purchase in the night more than 20 items

         - All the females that purchase more than 41 items

     

    I hope I have clarified what I am trying to figure out ...

    Thank you !

    Sunday, December 11, 2011 12:27 AM
  • Tatyana ? Anyone ?
    Monday, December 12, 2011 10:27 AM
  • To get you started;

    You can create a Mining model using Decision Tree algorithm using SSAS BIDS. Your case table will be Person's and nested table will be Purchase.

    -----

    THIS IS THE LIST OF 'GROUPED COLUMNS' WE FOUND THAT RESULT IN SPENDING OF 2,000-2,500 DOLLARS:

         - All the males in the age of 20-22, who purchase in the morning

         - All the people (male & female) in the age of 65-70, who purchase in the night more than 20 items

         - All the females that purchase more than 41 items

    After you create this model, you can use DMX content queries to examine your model to get the answers you're looking for. Furthermore, you can use DMX prediction query to make predictions.

    http://msdn.microsoft.com/en-us/library/cc645868.aspx

    Please let us know how it goes, and post back if you have any questions.

    hth,

    Rok

     


    please remember to mark as answered if the post helped resolve the issue.
    • Proposed as answer by Bogdan Crivat Tuesday, December 13, 2011 8:25 AM
    • Marked as answer by jsminer Tuesday, December 13, 2011 8:36 AM
    Monday, December 12, 2011 5:17 PM
  • I agree with rok1. You can get this information by creating decision tree model. If you are using Excel add-in, this is done by clicking "Classify" button in the "data mining" ribbon.

    You don't have to send DMX content queries to examine the model. You can see all of this information in the data mining model viewer. Click "Browse" button in the add-in.


    Tatyana Yakushev [PredixionSoftware.com]
    • Marked as answer by jsminer Tuesday, December 13, 2011 8:36 AM
    Monday, December 12, 2011 6:23 PM
    Answerer
  • Thanks both of you !

    Sorry for the newbie questions :

    a. How do I get the 'data mining' ribbon installed on my Excel ? is it an add-in ? cause I don't have it

    b. You said "if you are using excel". Is there another method ? (I would like to know as much as possible about this 'Decision Tree Model').

    Thank you very much again !

    Tuesday, December 13, 2011 8:36 AM
  • a) You need to install free add-in from http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=7294 

    (Add-in needs to be connected to SQL Server Analysis Services. SQL Server is not free).

    If I remember correctly, you need to check a checkbox during setup to get "Data Mining" ribbon, it is not installed by default.

    b) You can create data mining models in BIDS (BI Development Studio, which is basically Visual Studio). It is more difficult to use than Excel.


    Tatyana Yakushev [PredixionSoftware.com]
    Tuesday, December 13, 2011 11:24 PM
    Answerer