Query similar items based on tags/categories RRS feed

  • Question

  • Hi,

    I am trying to query items which are similar to a chosen item. To further explain..

    My table structure (simplified for the sake of this question) is like this...

    -itemid (PK)

    -mapid (PK)

    -categoryid (PK)

    There is a relationship between the ItemTable -to- CategoryMapTable -to- CategoryTable

    Now, let's say I have five items in my ItemTable called; car1, car2, car3, car4, car5. These items are mapped to categories in the CategoryTable. The CategoryMapTable keeps track of this.

    What I would like is that if I choose car1, a SQL or Linq query finds the most similar matches to this item. So for this example, the result may be car4 , car2, car5, car3 or in other words 'most similar' to 'least similar'.

    I'm looking for help in the form of a SQL query, Linq query, your theories, or links to other sites where I might be able to locate the answer. I'm not afraid to do some research on this issue!

    One thing I have considered is querying every item in the ItemTable and giving it a 'score' based on how many categories match up. However, I don't believe this is the most efficient way it could be done.

    I appreciate your time and help on this matter.


    J.R. Brown
    jrbrown3 [at]
    Wednesday, August 12, 2009 2:57 PM


All replies

  • How would car4 be "more similar" than car3 if they both have a link to the cars entry via the CategoryMapTable?

    Wednesday, August 12, 2009 3:14 PM
  • car4 would have more categories in common with car1 than car3.

    For example, if the table data was like this:

    -itemid, name
    -102, car3
    -103, car4

    -mapid, itemid, categoryid
    -23, 101, 237 (car1 - v6)
    -24, 101, 236 (car1 - leather)
    -25, 101, 234 (car1 - red)
    -26, 102, 238 (car3 - v8)
    -27, 102, 234 (car3 - red)
    -28, 103, 237 (car4 - v6)
    -29, 103, 236 (car4 - leather)
    -30, 103, 235 (car4 - silver)

    -categoryid, name
    -234, red
    -235, silver
    -236, leather
    -237, v6
    -238, v8
    -239, alloy

    Using these mappings, car1 has three categories assigned to it. car3 has one category in common with car1. car4 has two categories in common with car1. Therefore, the result should be that car4 is more similar to car1 than car3.

    Hope that makes sense!


    Wednesday, August 12, 2009 3:29 PM
  • Hi J.R.,

    The relevance criteria based upon no. of categories matches is nice. Have a look at this TSQL challenge and the submissions to get an idea on how to implement this.

    Hope that helps,

    Syed Mehroz Alam
    My Blog | My Articles
    • Proposed as answer by Syed Mehroz Alam Monday, August 17, 2009 5:51 AM
    • Marked as answer by Yichun_Feng Wednesday, August 19, 2009 1:36 AM
    Thursday, August 13, 2009 6:00 AM
  • Thanks Syed, I think this will help get me where I'm trying to go! Take care.

    - J.R.
    Wednesday, August 19, 2009 6:10 PM