none
XML word breaker and regular text word breaker

    Question

  • Hello all,

    I understood that the XML word breaker(Xmlfilt.dll) shipped with SQL server will not index the attributes of inner nodes.

    MSDN gave workaround to replace the xmlfilt.dll with XmlFilter.dll shipped with WIN 2008 OS.

    Here my question is

    1.Is XML word breaker is required to index the columns which store xml data only?

    2.If i have a FTI table with two columns which stores text and xml data.Is SQL server going to make use of two different word breakers(Xmlfilt.dll,LANGBRK.dll) in this case

    3.Does each language has its own Xmlfilt.dll like regular word breakers?

    Thanks & Regards

    Samba

    Wednesday, March 07, 2012 2:20 AM

Answers

  • Hi Samba,

    Filter is used to textual information from the documents in varbinary, varbinary(max), image, or xml data type columns for indexing.

    • For columns of the CHAR, NCHAR, VARCHAR, NVARCHAR, TEXT, and NTEXT data types the indexing engine applies the text iFilter. You can't override this iFilter.
    • For the columns of the XML data type the indexing engine applies the XML iFilter. You can't override the use of this iFilter.
    • For columns of the IMAGE, and VARBINARY data type, the indexing engine applies the iFilter that corresponds to the document extension this document would have if stored in the file system (i.e. for a Word document, this extension would be doc, for an Excel Spreadsheet this would be xls). Please refer to the later section on Indexing BLOBs for more information on this.


    Work breaker is used for linguistic analysis on all full-text indexed data, for a given language, there is a corresponding Work breaker. For multiple languages included in an XML data type column, you can specify different language using the xml:lang attribute. Please pay attention to Support for Different Languages in Full-Text Index on XML Column section.

    An iFilter can launch a different language word breaker from the one specified as the default in the full-text language setting for your Server, or from the word breaker you set for the column in the table you are full-text indexing.

    For more information, please refer to this blog written by Hilary Cotter: SQL Server Full Text Search Language Features.


    Stephanie Lv

    TechNet Community Support


    Friday, March 09, 2012 8:47 AM

All replies

  • Hi Samba,

    Filter is used to textual information from the documents in varbinary, varbinary(max), image, or xml data type columns for indexing.

    • For columns of the CHAR, NCHAR, VARCHAR, NVARCHAR, TEXT, and NTEXT data types the indexing engine applies the text iFilter. You can't override this iFilter.
    • For the columns of the XML data type the indexing engine applies the XML iFilter. You can't override the use of this iFilter.
    • For columns of the IMAGE, and VARBINARY data type, the indexing engine applies the iFilter that corresponds to the document extension this document would have if stored in the file system (i.e. for a Word document, this extension would be doc, for an Excel Spreadsheet this would be xls). Please refer to the later section on Indexing BLOBs for more information on this.


    Work breaker is used for linguistic analysis on all full-text indexed data, for a given language, there is a corresponding Work breaker. For multiple languages included in an XML data type column, you can specify different language using the xml:lang attribute. Please pay attention to Support for Different Languages in Full-Text Index on XML Column section.

    An iFilter can launch a different language word breaker from the one specified as the default in the full-text language setting for your Server, or from the word breaker you set for the column in the table you are full-text indexing.

    For more information, please refer to this blog written by Hilary Cotter: SQL Server Full Text Search Language Features.


    Stephanie Lv

    TechNet Community Support


    Friday, March 09, 2012 8:47 AM
  • Thanks Stephanie for your clarification.

    Regards

    Samba

    Friday, March 09, 2012 11:40 AM