SP2010 Server Search: Common words omitted from phrase search
-
16. dubna 2012 12:02
Symptoms
Lately we have become aware of a phrase search issue with the SharePoint 2010 Server Search (i.e. not FAST Search for SharePoint 2010). A search for the phrase "social media" only returns hits containing the word media, i.e. "social" is filtered away.When I'm performing a phrase search, I'm expecting to find the exact phrase, i.e. that the noise/common word filter is turned off.
If I just search for "social", I get the following message:
"Your query included only common words and / or characters, which were removed. No results are available. Try to add query terms."
We have verified that the word "social" is not listed explicitly as a noise word, but somehow it is considered as a common word by the search engine. It seems as if this is happening dynamically/automatically in the background.Questions
- How does the SharePoint 2010 search engine determine if a word is common or not?
- Does it depend on the frequency of the word in the index, i.e. raise a common word flag if a word appears more than X times in the total amount of Y documents?
- If so, can this feature be turned off?
- Has anyone else experienced the same issue?
Reference
Similar thread: "The exact phrase in Advanced Search acting like All of these words"
http://social.technet.microsoft.com/Forums/en-US/sharepointsearch/thread/3a03ab39-67a9-4270-aad6-dd9b376c6d65- Upravený SitarPhone 17. dubna 2012 10:55 Updated title
Všechny reakce
-
16. dubna 2012 12:19
The message you mention is exactly the one that will be displayed if it is a noise word. I've never heard of "hidden" noise words that are added automatically beyond your control, but I do know that it can be extremely tricky to find the correct noise word list. Check out http://www.loisandclark.eu/Pages/noisewords.aspx and http://www.loisandclark.eu/Pages/noisewords_upd.aspx, it was targeted towards 2007, but the techniques still apply.
Kind regards,
Margriet BruggemanLois & Clark IT Services
web site: http://www.loisandclark.eu
blog: http://www.sharepointdragons.com
- Upravený Margriet Bruggeman 16. dubna 2012 12:20
-
17. dubna 2012 7:16Moderátor
Official document about noise word (now called stop word):
http://technet.microsoft.com/en-us/library/dd361733(v=office.14).aspx
And noise word is treated differently in contains predicate and freetext predicate:
http://msdn.microsoft.com/en-us/library/ms492554.aspx
http://msdn.microsoft.com/en-us/library/ms497440.aspx
You can play with query syntax using http://blogs.technet.com/b/speschka/archive/2010/08/15/free-developer-search-tool-for-sharepoint-2010-search-and-fast-search-for-sharepoint.aspx
-
17. dubna 2012 11:06
Thanks for your replies. "Hidden" noise/stop words doesn't sound very plausible.
@Margriet: "I do know that it can be extremely tricky to find the correct noise word list" may actually be the case in our situation. We will take a closer look and see if we can find the correct files.
@GuYuming: Is it correct to assume that all phrase searches are performed using Contains predicate?
-
18. dubna 2012 3:42Moderátor
First, did you find your noise word definition in %ProgramFiles%\Microsoft Office Servers\14.0\Data\Applications\GUID\Config?
Out of the box SearchBox use Keyword Query syntax, which is the recommended syntax (http://msdn.microsoft.com/en-us/library/ee558911.aspx), instead of the search SQL syntax.
The KeywordQuery class has a property to control whether search queries containing only noise words will be executed (http://msdn.microsoft.com/en-us/library/microsoft.office.server.search.query.query.ignoreallnoisequery.aspx).
- Označen jako odpověď GuYumingMicrosoft Contingent Staff, Moderator 27. dubna 2012 8:48
-
22. dubna 2012 10:52
Thank you for the additional links!
When taking a closer look at the %ProgramFiles%\Microsoft Office Servers\14.0\Data\Applications\GUID\Config folder, we actually found the indicated noise words in the noiseeng.txt file. That means that we have identified the underlying reason.
It turns out that these noise words were intended for one specific search scope only, but they have applied to all searches. Our follow-up question is therefore: Can noise word lists be tied to specific search scopes only, or do they always apply to all search scopes?