Answered FULL TEXT Search few questions

  • 13 Juli 2012 11:46
     
     

    Hi,

     Few details to understand our requirements that will help to find better/suitable solution:

    Product certification:

    - SQL Server(EE)
          2005    32/64 bit
          2008    32/64 bit
    - OS (Windows server)
          2003 R2  -  32 and 64 bit
          2008 R2  -  32 and 64 bit

    - Language support
          English, chineses,japanese,korean and almost all European languages

    - A  .Net based UI application.

    - Database schema can be Unicode(NCHAR/NVARCHAR) or Non-Unicode(CHAR/VARCHAR) 
    - Database schema has around 200 Tables and each table has around 40-50 text/string columns
    Requirement :
              User can search/enter any string/text value and the result should be, list of all the tables and their columns, wherever that text/string is found.

    Solutions :
        1. We can write simple dynamic query to search a text/string in all tables(in all text columns)
             Comment - This option is ruled out, because SQL performance on a large database will be very - very slow
         
        2. sql server Full text search
            - Any limitations in this method, considering product certication we have above ?
            - Any SQL Script to add multiple table columns to a single Catalog ? (Manually adding them from SSMS, is very tedious and time taking task- around 200 tables, 40-50 columns per table)

    Thanks,
    Akc

Semua Balasan

  • 13 Juli 2012 18:29
     
      Memiliki Kode

    Full text search should be able to handle your requirements within the limits of the word breakers and stemmers.  The full text engine will handle data in as many languages as you have the word breakers and stemmers.  There is even a neutral language that does only simple breaking at punctuation and white spaces, but has no stemmers to find related forms (such as run, runs, running).  In addition, you need to proper IFilters for each type of document you might index, such as Microsoft Word, PDFs, and so forth.    Are you embedding such documents in your database or are you sticking with text columns?

    If you have a large amount of data, as your model suggest would be the case, you want to avoid doing daily full or incremental builds of the full text indexes.  Instead, configure the full text indexes to do Change Tracking = Automatic.  This will cause the full text indexes to be updated shortly after a change, although asynchronously.

    You can easily generate the CREATE FULLTEXT INDEX statement if you know which columns to include.  From the Books Only sample code see:

    CREATE FULLTEXT CATALOG production_catalog;
    GO
    CREATE FULLTEXT INDEX ON Production.ProductReview
     ( 
      ReviewerName  Language 1033,
      EmailAddress  Language 1033,
      Comments      Language 1033     
     ) 
      KEY INDEX PK_ProductReview_ProductReviewID 
          ON production_catalog; 

    If you intend to full text all Unicode and Non-Unicode columns, then you could create the column list code by selected from sys.columns for the table in question and include all columns with the proper data types.  Making sure each column has the right language (if that is a requirement) may require a little more work.

    I do have a few questions about your briefly outlined design and the multi-language requirements.   How are you planning to split up your languages?  Ideally, each individual language could be in a different column or in a different table.   This clear separation of languages will help you avoid false matches across languages.

    Of course, you can tag rows with the language Meta Tags to define the language for that row.  In addition, rich documents that are properly language tagged can be searched according to the language rules.  But this is more complicated to manage.

    So, yes, go ahead with Full Text Indexes.  The upper limit of capability increases with each release of SQL Server. 

    RLF

  • 16 Juli 2012 11:22
     
     

    Thanks for this useful information.

    1. Uni-code database Tables will have multiple languages string/text stored in single column,how to handle this situation? (User can search any language specific string/text in table)

    e.g. - Employee table column "name", can store employees names in English + Japanese + French string/text

    2. Your example shows creation of multi column index in a single catalog(production_catalog), But how to write script, to create full text index that included columns from multiple tables in single catalog (From SSMS, we can add columns from multiple tables into single catalog)?

    Can we create something like -

    CREATE FULLTEXT INDEX ON Production.ProductReview, Production.Products(
      ProductReview.ReviewerName  Language 1033,
      ProductReview.EmailAddress  Language 1033,
      ProductReview.Comments      Language 1033, 

     Products.name)   KEY INDEX PK_ProductReview_ProductReviewID ON production_catalog;

    Regards,

    Akc



  • 16 Juli 2012 15:57
     
     Jawab Memiliki Kode

    Akc,

    1. Regarding multiple languages in a single column, there are some serious challenges to getting this to work correctly.

    See: Choose a Language When Creating a Full-Text Index  http://msdn.microsoft.com/en-us/library/ms142507.aspx

    Since you intend to include non-Western languages, such as Japanese, a Western-language word breaker will not parse out the words as you might hope.  If you read the above link carefully, you will see that the challenges must be thoroughly planned out. 

    • If you have no knowledge of what column has what language, then the word breaker will cause you problems for any languages that do not use spaced to separate words.
    • If a single row of a single column can contain multiple languages such as: Börker, 陳推奨, Le Roy Ladurie, then the challenge is considerable.
    • If a single row of a single column can contain only one language, look into using XML columns, sot that the language tag can be stored inline before the column is indexed.  In this way, the work breaker and IFilter have a chance to discern the language.

    2. A full text catalog can contain many full text indexes.   However a full text index can only be on a table or on an indexed view.  This means that you could create a view like:

    CREATE VIEW ExpandedProductReview
    AS
    SELECT ProductReview.ReviewerName AS ReviewerName,
    	ProductReview.EmailAddress AS EmailAddress,
    	ProductReview.Comments AS Comments,
    	Products.name AS ProductName
    FROM Production.ProductReview AS ProductReview
        JOIN Production.Products AS Product
        ON ProductReview.ProductID = Product.ProductID

    You will need to create an index on this (if the view is correct) which persists the data so that it can be indexed.

    In addition, if a row is of a particular language and you have a language column, you can filter your query to only access rows from the desired language.

    RLF