How to avoid False Positives for News Queries via the Bing Search API

Answered How to avoid False Positives for News Queries via the Bing Search API

  • Thursday, February 21, 2013 1:05 PM
     
     

    Hi.

    At http://squirro.com we're using the the Bing Search API and we've discovered that Bing is returning lots of irrelevant results for News queries.

    It seems that in many cases Bing doesn't do a good job cleaning the news articles from UI elements like related stories, social media widgets, ads etc. before indexing.

    Here's a great example:

    I search for 'playstation' and sort by date.

    https://api.datamarket.azure.com/Bing/Search/v1/Composite?Sources=%27news%27&Query=%27playstation%27&Market=%27en-IN%27&NewsSortBy=%27Date%27

    The second document returned by this query is this:

    id: 451bcac5-cc3e-48a9-a712-99cb0428906e
    title: Kellogg recalls cereal because of glass fragments
    description: Investigators say Joshua Herrin, 25, attacked the man during a road rage incident NEW YORK (AP) - Sony unveiled its next-generation gaming system, the PlayStation 4, and promised social and remote capabilities. Wednesday's announcement gives the struggling ...
    url: http://www.13abc.com/story/21294301/kelloggs-recalls-cereal-because-of-glass-fragments

    Clearly Bing indexed too much here. If you look at the 13abc.com page, you can see that it matched the term 'playstation' in the top stories block: http://www.screencast.com/t/0HfKC6bfunq0

    The text in the description about the road rage incident is no longer visible on the the 13abc.com page.

    Technically we've already 'fixed' the issue on our side by no longer trusting Bing, downloading the pages ourselves, removing all the clutter and then validate if our query terms are still present.

    But of course this has a big impact on the cost side. We have to pay Microsoft for each 15 results page received through the API and after a few days of measuring we had to throw away between 15-20% of the results delivered.

    So my questions are:
    - Is there a way to avoid these "false" positives?
    - Is Microsoft aware of this issue? If so, is there a plan on improve the quality? If not, how can I escalate that with Microsoft?

    Best regards,
    Toni
    squirro.com

All Replies

  • Friday, February 22, 2013 3:22 AM
    Moderator
     
     Answered

    Hi,

    First of all, sorry for that. I suggest you move or open the thread to Bing Webmaster forum:

    http://www.bing.com/community/webmaster/f/12248.aspx

    They will help you on this kind of false positive issue.

    Thanks,


    QinDian Tang
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

  • Wednesday, March 20, 2013 12:02 AM
     
     

    This is not to resolve the issue of false positives, but reduce the number returned.

    I implemented in my code a negative word filter to add for each query.  For example searching for Seattle Mariners MLB will return a ton of gambling and other "news" articles that I don't want.  I add a -bet -bookmaking -gambling -gamble (and the list goes on).  This means that the results I do get are pretty clean.  I did additionally add filters to just remove complete spam domains that generate entries daily for sports that having NOTHING to do with them.

    Not ideal, but thought I would share because it made a huge difference for me in my baseball fan apps.


    Jason Short