Hi.
At http://squirro.com we're using the the Bing Search API and we've discovered that Bing is returning lots of irrelevant results for News queries.
It seems that in many cases Bing doesn't do a good job cleaning the news articles from UI elements like related stories, social media widgets, ads etc. before indexing.
Here's a great example:
I search for 'playstation' and sort by date.
https://api.datamarket.azure.com/Bing/Search/v1/Composite?Sources=%27news%27&Query=%27playstation%27&Market=%27en-IN%27&NewsSortBy=%27Date%27
The second document returned by this query is this:
id: 451bcac5-cc3e-48a9-a712-99cb0428906e
title: Kellogg recalls cereal because of glass fragments
description: Investigators say Joshua Herrin, 25, attacked the man during a road rage incident NEW YORK (AP) - Sony unveiled its next-generation gaming system, the PlayStation 4, and promised social and remote capabilities. Wednesday's announcement gives the
struggling ...
url: http://www.13abc.com/story/21294301/kelloggs-recalls-cereal-because-of-glass-fragments
Clearly Bing indexed too much here. If you look at the 13abc.com page, you can see that it matched the term 'playstation' in the top stories block: http://www.screencast.com/t/0HfKC6bfunq0
The text in the description about the road rage incident is no longer visible on the the 13abc.com page.
Technically we've already 'fixed' the issue on our side by no longer trusting Bing, downloading the pages ourselves, removing all the clutter and then validate if our query terms are still present.
But of course this has a big impact on the cost side. We have to pay Microsoft for each 15 results page received through the API and after a few days of measuring we had to throw away between 15-20% of the results delivered.
So my questions are:
- Is there a way to avoid these "false" positives?
- Is Microsoft aware of this issue? If so, is there a plan on improve the quality? If not, how can I escalate that with Microsoft?
Best regards,
Toni
squirro.com