Market Basket Analysis novice question, since result set is being limited or incorrect. Thank you

问题 Market Basket Analysis novice question, since result set is being limited or incorrect. Thank you

  • 2009년 10월 19일 월요일 오후 8:00
     
     

    I have the following table layout for a Basket Analysis (Association Model)

    Customer:

                    Customer_id

                    Zip_code

                   

    Movies

                    Customer_id

                    Title

                    Price_purchase

     

    Structure Definition

     

    create Mining Structure Movie_Market_basket_30_test

    (

        customer_id TEXT KEY,

        [Movies] TABLE (

            [Title] TEXT KEY

        )

    ) with HOLDOUT (30 PERCENT) REPEATABLE(5000)

     

    Model Definitions

     

    ALTER MINING STRUCTURE Movie_Market_basket_30_test

    ADD MINING MODEL [Default Association_30_test]

    (

        customer_id,

        [Movies] PREDICT (

            [Title]

        )

    )

    Using Microsoft_Association_Rules

     

     

    ALTER MINING STRUCTURE Movie_Market_basket_30_test

    ADD MINING MODEL [Modified Assocation_30_test]

    (

        customer_id,

        [Movies] PREDICT (

            [Title]

        )

    )

    USING Microsoft_Association_Rules (Minimum_Probability = 0.1)

     

    Issues:

    Here is the question I am trying to answer

    1)      What are the top 10 movies by price

    2)      How do I query the data for the rest of the 70 percent data

    3)      How do you change the max_item_set, because it only doing it to the default max_item_set of 3?

     

    Futhermore, the price for the movies change base on the type of the media (DVD, stream,cable), and because of that I think that the Basket Analysis might be incorrect.

     

    Thank you for any help


    Tomas

모든 응답

  • 2012년 5월 29일 화요일 오후 4:57
     
     

    Hi Tomas

    I'm also learning about basket analysis with the Microsoft Association Rules algorithm. I have many things to learn, but I'll try to help.

    1) I'm not sure what you mean by "top 20 movies by price". What the algorithm will give you is just a list of frequent bundles. How many items are in each bundle and how many supporting purchases a bundle must have to be consider a bundle a "frequent" bundle is up to you (you can modify the default algorithm parameters). Once you train the algorithm you will get your list of frequent bundles and then you can order them by support, probability or importance.

    2) You can use DMX cases queries to access you data. It would go something like SELECT * FROM MINING STRUCTURE <structure name>.CASES where IsTrainingCase(). This query would return information about the 70% you used for training.

    3) About the max_item_set, I'm not that good with DMX, so I don't use it to create mining structures or models (at least no directly). I do that in the SQL Server Business Development Studio. It's simple to edit this attribute there. You just right clikc the mining structure, select Modify Algorithm Parameters and then specify the max_item_set.

    About the fact that the price is not the same for each movie, I think it doesn't affect your results at all. In fact, an interesting question (one that I'm trying to answer myself without success so far) is what is the average value per sale for each bundle. I haven't been able to do this yet, but I've seen this kind of analysis is frequently done in basket analysis, so I don't think different prices will affect.

    That is all I know so far. I hope it helps somehow.

    Saludos!


    • 편집됨 amilkar0417 2012년 5월 29일 화요일 오후 4:59 bad spelling
    •