locked
How to make different queries to the same result by token filter RRS feed

  • Question

  • Hi,

    we've set the analyzer to our index as below

                  Analyzers = new Analyzer[]
                        {
                            new CustomAnalyzer("proIndex",TokenizerName.Standard, new List<TokenFilterName>(){ "generateNumberParts", "prefixes",TokenFilterName.Lowercase },new List<CharFilterName>(){ "removeDash" } ),
                            new CustomAnalyzer("proSearch",TokenizerName.Standard, new List<TokenFilterName>(){ "generateNumberParts",TokenFilterName.Lowercase },new List<CharFilterName>(){ "removeDash" } ),
                        },
                        TokenFilters = new TokenFilter[]
                        {
    
                            new WordDelimiterTokenFilter("generateNumberParts",false,true,null,null,null,false,true,true,false,null),
                            new EdgeNGramTokenFilterV2("prefixes",3,20,null),
                        },
                        CharFilters = new[]
                        {
                            new MappingCharFilter("removeDash",new List<string>() { "-=>"})
                        }
    

    And synonyms settings as below,

    var synonymMap = new SynonymMap()
                    {
                        Name = "product-name-synonymmap",
                        Format = "solr",
                        Synonyms = "webop, wop => webop\n"
                    };
    

    We'd like to find "WebOP-3100" through

    webop3100, webop 3100, webop-3100, wop3100,wop 3100, wop-3100

    But when we cannot get the same result rank from these 2 queries

    1. wop 3100
    2. wop3100

    Could you give me some clue?

    Thank you.

    Monday, August 14, 2017 9:04 AM

Answers

  • Hi, 

    Thanks for the detailed question. If you want to find "WebOP-3100" through 

    webop3100, webop 3100, webop-3100, wop3100,wop 3100, wop-3100

    Why not set the synonym rule accordingly? 

    var synonymMap = new SynonymMap()
                    {
                        Name = "product-name-synonymmap",
                        Format = "solr",
                        Synonyms = "webop3100, webop 3100, webop-3100, wop3100,wop 3100, wop-3100
    \n"
                    };

    Please use the phrase search operator to expand multi-word synonyms, "wop 3100" for example. 

    Could you elaborate what you mean by "cannot get the same result rank from these 2 queries"? Do you mean ordering of docs in the responses for "wop 3100" and wop3100 are different from the rest? 

    Could you send the query and the doc ID that the query is expected to match? Thanks-

    Nate

    Monday, August 14, 2017 11:18 PM

All replies

  • Hi, 

    Thanks for the detailed question. If you want to find "WebOP-3100" through 

    webop3100, webop 3100, webop-3100, wop3100,wop 3100, wop-3100

    Why not set the synonym rule accordingly? 

    var synonymMap = new SynonymMap()
                    {
                        Name = "product-name-synonymmap",
                        Format = "solr",
                        Synonyms = "webop3100, webop 3100, webop-3100, wop3100,wop 3100, wop-3100
    \n"
                    };

    Please use the phrase search operator to expand multi-word synonyms, "wop 3100" for example. 

    Could you elaborate what you mean by "cannot get the same result rank from these 2 queries"? Do you mean ordering of docs in the responses for "wop 3100" and wop3100 are different from the rest? 

    Could you send the query and the doc ID that the query is expected to match? Thanks-

    Nate

    Monday, August 14, 2017 11:18 PM
  • Hi Nate,

    Since WebOP is a seriesof products, and that means we have hundreds of product name prefix with "WebOP-".So we decided to use a general synonyms setting instead.

    The order of results are indeed different through wop3100 and wop 3100,

    • When we query "wop 3100" -> we got WepOP-3100 in the first record.
    • But when we query "wop3100" -> we got those products that only contains "3100" in the first result (ex. AAA-3100, BBB-3100) .

    And we just send you our service name and docID through to  your email.

    Thank you so much.

    Sam

    Tuesday, August 15, 2017 1:55 AM
  • Thanks, Sam.

    > When we query "wop 3100" -> we got WepOP-3100 in the first record.

    In this case, your query is a phrase query containing two terms. One of the two terms matches your synonym rule 'webop, wop => webop' and the query is written to 'webop 3100' thus ranking documents that contain both terms higher.

    > But when we query "wop3100" -> we got those products that only contains "3100" in the first result (ex. AAA-3100, BBB-3100) .

    The query 'wop3100' does not match the synonym rule 'webop, wop => webop' and does not expand. Using the analyze API, you can see the query 'wop3100' is analyzed to two token <wop3100> and <3100> by the analyzer 'proSearch' and eventually parsed to "wop3100 OR 3100". The search query only looks for products that contain the text wop3100 or 3100. 

    Hope this makes sense. Please take a look at the Analyze API to see how text is analyzed to. You have my contact so please feel free to set a quick conf call with me if you need a help. 

    Nate 


    • Edited by Nate Ko Wednesday, August 16, 2017 11:40 PM
    Wednesday, August 16, 2017 11:40 PM
  • Hi Nate,

    Thank you for the detailed explanation. That's really, really help!!

    The key point of my misunderstanding is,

    I thought wop3100 could be split into "wop" and "3100"  based on "proSearch"  Analyzer. (Word Delimiter Token Filter settings)

    Is there anyway to meet the goal?


    My reference

    1. https://docs.microsoft.com/en-us/rest/api/searchservice/custom-analyzers-in-azure-search
    2. https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html#analysis-word-delimiter-tokenfilter

    Thursday, August 17, 2017 5:43 AM