locked
Full Text Index Query - SQL Server 2005 RRS feed

  • Question

  • Hi All,

    In SQL Server 2005 i have created Full text index on nvarchar(max) fields. And this field save the HTML markup. Now my question is that how can i disable HTML tags from being indexed? As indexed field save the HTML it is likely possible that if some one search for 'span' it may return the row which has HTML span tag in indexed fields.

    Tuesday, March 22, 2011 1:10 PM

Answers

  • I believe you need to update your noise word list in your SQL server instance.

    Unfortuantely, once you updated your noise word list all other databases use the full text index or other table with the different text will regard the word you added into the noise word as noise word, which some time may not you want.

    TO add your own noise word go to:

    C:\Program Files\Microsoft SQL Server\yourSQLinstance name\MSSQL\FTData\noiseenu.txt. (enu means US english)

    edit with any text editor. save and then don't forget to repopulate the fulltext index.

    IN SQL server 2008 or later this has been changed to Stop word list.

    Tuesday, March 22, 2011 2:05 PM

All replies

  • Can you have another field in your table that will contain the actual text without HTML tags? This seems as a logical solution to me.
    For every expert, there is an equal and opposite expert. - Becker's Law

    Naomi Nosonovsky, Sr. Programmer-Analyst

    My blog
    Tuesday, March 22, 2011 1:51 PM
  • I believe you need to update your noise word list in your SQL server instance.

    Unfortuantely, once you updated your noise word list all other databases use the full text index or other table with the different text will regard the word you added into the noise word as noise word, which some time may not you want.

    TO add your own noise word go to:

    C:\Program Files\Microsoft SQL Server\yourSQLinstance name\MSSQL\FTData\noiseenu.txt. (enu means US english)

    edit with any text editor. save and then don't forget to repopulate the fulltext index.

    IN SQL server 2008 or later this has been changed to Stop word list.

    Tuesday, March 22, 2011 2:05 PM
  • Hi Steven,

    Thanks for your pinpoint. It really helps me.

    But i think use of html ifilter would be better than tweaking noise word list. For use of ifilter i know that i have to convert my nvarchar(max) to varbinary(max). And i also know that changing only data type is not sufficient. So can u me suggest what should i do to use HTML ifilter with my table.

    Wednesday, March 23, 2011 5:08 AM
  • Hi I think you need 3 columns other than the primary key and other columns you need:

    1. the nvarchar(max) for the original HTML content;

    2. create a persisted calculated column for the HTML content as varbinary(Max)

    3. a document type column to store file type is HTML

    Once the indexer knows the type is HTML it will use  iFilter for HTML. (for plain text it is use vanilla ifilter)

    Wednesday, March 23, 2011 5:54 AM
  • Hi I think you need 3 columns other than the primary key and other columns you need:

    1. the nvarchar(max) for the original HTML content;

    2. create a persisted calculated column for the HTML content as varbinary(Max)

    3. a document type column to store file type is HTML

    Once the indexer knows the type is HTML it will use  iFilter for HTML. (for plain text it is use vanilla ifilter)

    Hi Steven,

    Again thanks a lot for your response.

    Can u please explain me what u mean by "create a persisted calculated column for the HTML content as varbinary(Max)" . . . ??

    And once i have HTML already stored in varbinary(Max) then what;s the use of nvarchar(max) fields.

    Wednesday, March 23, 2011 5:59 AM
  • If you already stored as varbinary(Max), you don't have to stored the varchar(Max) content. the important thing is to have a document type as HTML.

    Wednesday, March 23, 2011 6:05 AM