none
News search does not honor query parameters

    Question

  • I'm trying to programmatically find news articles for my application and am using the Bing Search API to accomplish this. However, I've noticed several problems which I'd like to point out to any future users and/or developers working on this.

    1. API (and Bing news website itself) does not always honor the NewsSortBy=Date parameter after the first page. For example, search for 'Domo' on either Bing news website, or using the API. Select 'Most Recent' (on website) or add NewsSortBy=Date for API call. The first 2 pages are correct, however, the 3rd page shows results that are more recent than those that were displayed on the 2nd page. Same goes for each subsequent page, there are results are are more recent than those shown on the previous page. Also, some results returned in page 2 are shown again in page 3, even if they are more recent than those already shown on page 2.

    2. The '$top' parameter can only limit the number of results returned, which are seemingly arbitrary (NOT always 15 like the documentation says). For example a search for Domo returns 14 results. See call (A) below. A subsequent search using '$skip=15' (call B) shows more results, so we know there are enough results to return 15 from the original call. Setting the $top parameter to anything >14 does not increase the number of results returned. Another example is searching for ‘Branches’. In that case there are only 13 returned in the first page, even though the 2nd page ($skip=15) returns the expected 15 results.

    3. API does not always honor the '$skip' parameter. A search for ‘Scaligent’ returns only 2 results (see call C). Adding the skip parameter to the call, regardless of value, will return the same 2 results.

    Because of these issues, I don’t understand how anyone can use this API in the way documented on the site. When a call returns back 10 results even though $top is set to 15, did that return back 10 because the first call only returns a max of 10 (in which case I should make another call), or because there are no more results? If I assume I’ll make another call because there COULD be more results, I could get the exact same 10 results again, even though I’ve set $skip to 15.

    Please let me know how to solve these problems or if I’m doing something wrong. I’ll happily provide more examples as necessary.

    Calls:

    A. https://api.datamarket.azure.com/Bing/Search/v1/News?Query=%27Domo%27&$format=json

    B. https://api.datamarket.azure.com/Bing/Search/v1/News?Query=%27Domo%27&$format=json&$skip=15

    C. https://api.datamarket.azure.com/Bing/Search/v1/News?Query=%27Scaligent%27&$format=json

    Wednesday, November 06, 2013 10:48 PM

All replies

  • 1 question.

    I test it on the Bing API website, I find the date order is right.

    2 question.

    3 question.

    I ever used skip(100) to next page.

    I think every page show 15 records, we could use skip(100) to next or click the page. Like this picture:

    If you want to use top, please try to use more than 100.Like top(200).

    If I miss something, please tell me know.

    Thursday, November 07, 2013 9:21 AM
  • I didn't mean to imply EVERY query was wrong, just some queries were wrong. Specifically, those that I listed above. It's great that the date order was right for 'Demo' but they are wrong for 'Domo'. Likewise for the $top and $skip parameters. I know there are plenty of queries that return the correct number of results. However, there are also queries that do not follow the documentation (again, see my examples). If you'd like me to find more INCORRECT examples, I'd be happy to list more. I'm not concerned about the examples that are correct, but the more obscure searches that are not. It means the behavior is non-deterministic, which is a problem for me. See http://en.wikipedia.org/wiki/Counterexample

    Tuesday, November 12, 2013 4:14 PM
  • I think you might have an issue with an order of the parameters. I don't manage to reproduce your experience when I place format at the end (where it belongs):

    https://api.datamarket.azure.com/Bing/Search/v1/News?Query=%27Domo%27&$skip=15&$format=json

    Can you check this again?

    Thanks,

    Max

    Tuesday, November 12, 2013 7:10 PM
  • Thanks for looking into this issue. I realize now that I should have submitted 3 issues, instead of one, because I'm not sure which problem you can't reproduce. :) For right now, I'll assume you're talking about the $top parameter not working as expected. Here's another example of this:

    https://api.datamarket.azure.com/Bing/Search/v1/News?Query=%27Domo%27&$top=20&$format=json

    In this example, I would expect 20 results (hence the $top=20). Actual result only contains 14 search results. 14 would make sense if there were only a TOTAL of 14 articles that matched that search term, but as you can see, there are plenty more. Adding a $skip=15 shows another 14 stories (Again we get the magical number 14, which doesn't make any sense to me).

    Even without specifying the $top parameter, the search result still only shows 14 results per call, when the documentation clearly states there should be 15. I have seen calls that only return 14, 13, and 10 per call, even if there are plenty more on the next page.

    Count the number of results you get back from that call, if you're results are like mine, you'll only see 14 results.

    If I misunderstood which problem you can't reproduce, let me know. I have more examples of the other problems as well :)


    • Edited by DevFan1 Tuesday, November 12, 2013 7:48 PM
    Tuesday, November 12, 2013 7:47 PM
  • For an example of the $skip parameter not working, try this call:

    https://api.datamarket.azure.com/Bing/Search/v1/News?Query=%27Scaligent%27&$format=json

    (Listed as call 'C' in the original problem)

    Tuesday, November 12, 2013 7:51 PM
  • I am still missing something :)

    This query returns only 2 results for me right now. So $skip indeed shouldn't work.

    Thanks,

    Max

    Wednesday, November 13, 2013 12:21 AM
  • News pages are limited to 15 results and hence first result of $top is actually limited to 15 results as well. Adding $skip on top of this goes to the next 15 results.

    There is an issue of getting only 14 results on page one, which may actually be a bug on our side. Let me check and get back to you.

    Thanks,

    Max

    Wednesday, November 13, 2013 12:31 AM
  • We went through some internals of this and indeed, for some specific terms and requests going to some specific datacenters, there can be a variation on amount of results returned for News.

    This is pretty intermittent - if you use some other keywords like "xbox", you would see API behaving in a more predictable way.

    Thanks,

    Max

    Friday, November 15, 2013 6:54 PM
  • So I've ended up using 10 as a magic number in my logic.

    If the query returns less than 10 results (assuming you asked for 15 results), assume there are no more entries, don't bother doing a skip for the next set of results. This avoids the problem with $skip returning the same results over and over again. If there were 10 or more, do another query, but set the $skip to be 15 (again assuming you're asking for 15 results per query). This avoids the problem where $top doesn't actually return the correct number of results in some cases.

    I would love to use the much cleaner algorithm described below: 

    RESULTS_PER_CALL = 15 allHits = []; skip = 0; while (allHits.length() < numRequested) { queryResults = callService(skip); allHits.concat(queryResults); if (queryResults.length() < RESULTS_PER_CALL) { break; } }

    When using 10 as the magic number, I still have to make an extra call even if the result count is between 10 and 14, since I don't know if the results are limited due to an API bug, or because there's simply not that many results.

    Also, if you're interested in sorting by date, make sure you do that in your code. This avoids the problem where the NewsSortBy parameter doesn't always work.

    Thursday, November 21, 2013 7:02 PM