none
Score Matchbox Recommender--Item Recommendation RRS feed

  • Question

  • Hello,

    I notice that when training a Matchbox Recommender and using the Score Matchbox Recommender module to predict the top items for each test user, the recommender does not give predictions for some of the users. For instance, the Movie Recommendation example gives Movie recommendations for 2955 users even though there are in fact 3830 users in the test dataset. Also, the number of users for which recommendations are given decreases as the parameter for minimum recommendations is increased.

    Why do I not receive predictions for all of the users? Also, this is in "From Rated Items" mode.

    Thanks for the help!

    Thursday, November 19, 2015 5:29 AM

Answers

  • Evaluation in recommendation problems can be confusing, because unlike in classification or regression, we don't have the ground truth. That is, we don't have data of the type "this user said that they want these 5 items to be recommended to them". Therefore, we have to generate such data from the rating data. And we can only recommend items that are in the test set, so that these recommendations can be evaluated. If we recommend an item that the user hasn't rated, how can we evaluate this prediction? That's why when you score in evaluation mode, the input to the scorer is user-item pairs, not only users. 

    I agree that the meaning of the "pool" parameter isn't clear (and I'll open a task to add this to the documentation). What it means is that if the number of possible items to recommend to a user is less than the value of this parameter, then the user is skipped. Therefore, setting the value of this parameter to less than 2 doesn't really make sense - if you only have 1 possible item to recommend to a user, then all recommenders in the world will do this prediction, and so evaluation is meaningless. The more items you have in the pool to recommend from, the more predictions will differ from one recommender system to another, and hence the more reliable the evaluation numbers are.

    If your users haven't rated enough items, then this data is difficult to evaluate, and the 20 users that end up going to the evaluator are all you can get. But that's probably not the entire story. You can use the Split Data module in a Recommender Split mode to tune the fraction of user ratings that end up in your training set versus test set. Note that setting this to a larger number will probably give better evaluation results (because you trained on more instances) but these results will be less reliable (because of the smaller item recommendation pools). 

    -Y-

    Friday, November 20, 2015 2:43 PM

All replies

  • typically, if you want to score your full dataset, you would need to use the "From all items" mode, from Rated items would use only all rated items that would explain why you do not have rating for all users.

    As well could you describe the whole setup of your Score matchbox module, as other parameters tell you how many proposals, used by the recommender or/and how many are actually  returned by the recommenderReg


    • Edited by Moncef Thursday, November 19, 2015 11:36 AM
    Thursday, November 19, 2015 9:49 AM
  • As explained in the documentation, "From Rated Items" should only be used if you want to evaluate the predictions. This mode imposes certain restrictions on the possible recommendations and is not intended for a production setup. You can use it in conjunction with the evaluator to tune hyper-parameters like the number of traits or iterations. Change this mode to "From All Items" and let me know if you still don't see predictions for every user.

    -Y-

    Thursday, November 19, 2015 11:12 AM
  • When I change the mode to "from all items," I do see predictions for every user. However, my concern is that in my personal project (not the Movie Recommender sample), the Score Recommender module in "From Rated Items" mode returns items for only 20 out of 389 users, which are then passed along to the Evaluate Recommender module. While this may partially be a product of my dataset, I am concerned that it is not using nearly enough of the users in the test dataset to evaluate the performance of the recommender. Currently the Score Recommender module is configured to return at least two items for each user.

    Thursday, November 19, 2015 9:29 PM
  • It s doing so because of the parmeter minimum size pooling for a user, in the Score Matchbox recommender, if you set it to 1 then all users will be scored

    this mean how many minimal scored as required for each user to be considered.

    Regards


    • Edited by Moncef Thursday, November 19, 2015 10:41 PM
    • Marked as answer by VSAnimator Thursday, November 19, 2015 11:10 PM
    • Unmarked as answer by VSAnimator Thursday, November 19, 2015 11:12 PM
    Thursday, November 19, 2015 10:40 PM
  • I believe that parameter actually is the minimum on how many items it returns for each user, not how many must be scored for a user to be considered. That doesn't explain why so many users are being left out.
    • Edited by VSAnimator Thursday, November 19, 2015 11:14 PM
    Thursday, November 19, 2015 11:10 PM
  • when I change that param to 1 it scored al my users ...From what I see this param is used to configure users that should be considered for training the model. Only users with a minimal number of scores need to considered just to strenght scored matrix evalutation.


    And that parameter does not exist when you are using all Items, this is because all users are scored.
    • Edited by Moncef Thursday, November 19, 2015 11:26 PM
    • Proposed as answer by Moncef Friday, November 20, 2015 2:48 PM
    Thursday, November 19, 2015 11:19 PM
  • Evaluation in recommendation problems can be confusing, because unlike in classification or regression, we don't have the ground truth. That is, we don't have data of the type "this user said that they want these 5 items to be recommended to them". Therefore, we have to generate such data from the rating data. And we can only recommend items that are in the test set, so that these recommendations can be evaluated. If we recommend an item that the user hasn't rated, how can we evaluate this prediction? That's why when you score in evaluation mode, the input to the scorer is user-item pairs, not only users. 

    I agree that the meaning of the "pool" parameter isn't clear (and I'll open a task to add this to the documentation). What it means is that if the number of possible items to recommend to a user is less than the value of this parameter, then the user is skipped. Therefore, setting the value of this parameter to less than 2 doesn't really make sense - if you only have 1 possible item to recommend to a user, then all recommenders in the world will do this prediction, and so evaluation is meaningless. The more items you have in the pool to recommend from, the more predictions will differ from one recommender system to another, and hence the more reliable the evaluation numbers are.

    If your users haven't rated enough items, then this data is difficult to evaluate, and the 20 users that end up going to the evaluator are all you can get. But that's probably not the entire story. You can use the Split Data module in a Recommender Split mode to tune the fraction of user ratings that end up in your training set versus test set. Note that setting this to a larger number will probably give better evaluation results (because you trained on more instances) but these results will be less reliable (because of the smaller item recommendation pools). 

    -Y-

    Friday, November 20, 2015 2:43 PM
  • of course its agreed that using the pool param to 1 is not the best to get a good training set

    however it was explaining why all users are not included in the training set if you use another value then 1.


    • Edited by Moncef Friday, November 20, 2015 2:50 PM
    Friday, November 20, 2015 2:50 PM
  • Okay thanks I understand now!
    Friday, November 20, 2015 8:16 PM