using LINQ to access Lucene ? RRS feed

  • Question

  • I have several (hundreds) of Lucene.NET index databases, one for each day (yyyy-mm-dd folders).  I want to implement LINQ over the indexes to do ad-hoc queries, treating the entire set of folders as a single historical database.  Is this possible, if so, where do I start? Are there interfaces I can implement in my own classes to allow LINQ to traverse and query these indexes?  Also, is it possible to use LINQ from within a Silverlight application?



    Thursday, October 25, 2007 10:32 AM


  • I recently started blogging and posting to a forum about a LINQ to Lucene provider that I'm developing at codeplex.  There is no available source code available, but here is a link to the codeplex address as well as some early development details about the project quoted from the blog.


    Providing a custom LINQ solution for the Lucene Information Retrieval System, commonly referred to as a search-engine.

    Currently I have started working on a project that I hope to get some code samples released for soon, a 'LINQ to Lucene' custom LINQ provider. Lucene is an Information Retrieval System used for full-text searching and scoring results commonly referred to as a search engine (similar to google-type-searching). LINQ is Microsoft's extensible Language Integrated Query platform that provides querying directly in a managed CLR. LINQ comes out of the box with 'LINQ to Objects', 'LINQ to DataSets', LINQ to XML', LINQ to SQL' and 'LINQ to Entities'. The goal of 'LINQ to Lucene' is to provide developers with the ability to enjoy full-text searching using a fast-proven search-engine within the .Net managed CLR. The plan is to get the 'LINQ to Lucene' code-base pushed up to CodePlex for additional developers to generate feed-back and improve and enhance.

    Since the format of Lucene, which stores its data in an index, is to use queries and tokens for searching then documents and hits for returned items, the project will be capable of mapping custom business object types, a kind of Object-Index-Mapper (OIM) similar to the typical Object-Relational-Mapper (ORM) that 'LINQ to SQL' provides. I'm using attributes and reflection on classes to be indexed and their members, similar to the Table and Column decorators for SQL mapped objects. Typically developers only store a few fields in the index used for querying or retrieval so I'd like for a business object to be able to be decorated with both Lucene and SQL attributes. If the instance is created from a Lucene index rather than the a SQL table then perhaps the object could get the remaining fields from SQL using lazy instantiation or something. Classes stored in the lucene index will be decorated with the Index attribute (with its various named properties) and the properties of that class that are indexed will be decorated with the Field attribute (with its various named attributes, (ie. Tokenized).

    Similar to the DataContext object, I'm currently thinking there will be a DataIndex object that manages retrieval and mapping of objects to the appropriate index. The initial release of this will treat the index as though it is a read-only item, but the DataIndex object will provide a facility for writing to the index in a future version, perhaps using a similar SubmitChanges method. The DataIndex may also provide an Facade type designer pattern, managing how the Index is read (in memory, from disk, multiple-threads etc.)

    Finally it's worth noting that the implementation is being designed to support the generation of an Expression Tree that can be analyzed and converted into a Lucene request and ultimately executed against the index on first request following the same pattern that 'LINQ to SQL' uses allowing for rich query generation.

    The project is built using Test Driven Development with the Visual Studio 2008 Test Projects.
    The source code is written in C# 3.0 using the Lucene.Net port version 2.0.

    Stay-tuned for any future code samples and please provide any feedback and/or reccomendations.




    Thursday, October 25, 2007 12:46 PM