none
Need Advise RRS feed

  • Question

  • Hi,

    I'm developing my version of Full text search. The search tokens was atributed. I aimed to build result set ordered by rank of intersection of attributes containing sources reference Id's. The parser implements complex algorithm on tokenizing phase since the source may contain typo graphic  mistakes or diffrent abrivations needed to be handled. The rank evaluation is also more simpler. 

    I may have huge set of symbol (for token) and reference Id (of source record) pair that used for intersection (as FullText index)

    In version 0.1, I used a flat file to store those pairs (sorted) and  another flat file that holds unique symbol and offset of starting position in first file (an index file). This allow me to read RefIDs as block to internal array that eventually used in intersection  phase.

    Now I decide to port this structure to a RDB.

    I'm looking an advise to choose a dB.

    I like to access dB through TCP/IP protocol since my server and dB may host in diffrent box. The ODBC connection may restrict me to utilize some API's provided by dB. (performance is my major concern)

    Unfortunately I couldn't study and explore  MsSQL features. Are there some API's provided for native code? Or any approach to handle large data from dB as block of defined size?

    I found MySql is some how suitable for me by provided API's. But I like to know other dB's feature before step in MySql.

    I may share brief details of my application architecture who may interested.

    Thanks in advance

    MTT 

    Monday, September 25, 2006 10:42 AM

All replies

  • 
    It is possible to write C++ code that runs in the server (called an Extended Stored Procedure), but that feature is deprecated in favor of writing code in a CLR language (which you'll see referred to as SQLCLR integration).  I'm not sure what you mean by "handle large file from dB as block of defined size".  Do you mean that you want to import a file into the database?  If so, there are several different ways of handling that -- BCP, BULK INSERT, SQL Server Integration Services, OLE DB has a bulk API, .NET 2.0 has a new SqlBulkCopy class.  If you want to store the data in SQL Server in a binary format (probably not a good idea given what you're doing), you can do that as well...
     

    --
    Adam Machanic
    Pro SQL Server 2005, available now
    http://www..apress.com/book/bookDisplay.html?bID=457
    --
     
     

    Hi,

    I'm developing my version of Full text search. The search tokens was atributed. I aimed to build result set ordered by rank of intersection of attributes containing sources reference Id's. The parser implements complex algorithm on tokenizing phase since the source may contain typo graphic mistakes or diffrent abrivations needed to be handled. The rank evaluation is also more simpler.

    I may have huge set of symbol (for token) and reference Id (of source record) pair that used for intersection (as FullText index)

    In version 0.1, I used a flat file to store those pairs (sorted) and another flat file that holds unique symbol and offset of starting position in first file (an index file). This allow me to read RefIDs as block to internal array that eventually used in intersection phase.

    Now I decide to port this structure to a RDB.

    I'm looking an advise to choose a dB.

    I like to access dB through TCP/IP protocol since my server and dB may host in diffrent box. The ODBC connection may restrict me to utilize some API's provided by dB. (performance is my major concern)

    Unfortunately I couldn't study and explore MsSQL features. Are there some API's provided for native code? Or any approach to handle large data from dB as block of defined size?

    I found MySql is some how suitable for me by provided API's. But I like to know other dB's feature before step in MySql.

    I may share brief details of my application architecture who may interested.

    Thanks in advance

    MTT

    Monday, September 25, 2006 2:52 PM