locked
Design for a search feature. RRS feed

  • Question

  • I'm wondering if someone might be able to point me in the right direction on how to design a search feature.

    Currently, I have several hundred XML files that contain data on (for the sake of the discussion, let's say) customers.   The application currently loads and wraps  the XML files with a Customer class which gives access to the data  (first name, last name, address, etc) and calculates a few others (value of all purchases, whatever). 

    I'd like to build a feature that would allow users to search for customers based on certain criteria.    My first 'brute force' approach to this was simply to have a form with all the relevant controls (textBoxFirstName, checkBoxSearchForFirstName, etc') and then 'if (checkBoxSearchForFirstName.Checked && textBoxFirstName.Text == customer.FirstName) return true;'  etc, etc.  But this is hardly elegant.  I've also considered wrapping the customers in a 'searchable customer' class that you could pass the instance of customer and the serach criteria, but I can't get my head around how to build it....

    So, assuming I'm not the first to have to do this, I'm wonding if someone might point me in the right direction -- A 'serach engine design patten' or some such :)   This is more about teaching myself than getting it to work, so links to articles/discussions/example on the topic would be great...

    Oh, and yes: I realise this would make much more sense if the data was in a database instead of a bunch of XML files.  But, long story short, that's not possible :)

    Monday, July 6, 2009 5:29 PM

All replies

  • I agree, your XML being all in files is probably going to be the undoing of any pattern, unless you have a lot of memory, and can hold 100 Xml files in memory.

    I don't think you need the checkbox on the form, you can simply put the text boxes, and pass them through as is, then in your queries return data that matches the text if it is not null.

    Here's a link to some information: Searching Design Pattern

    Really though, a lot of the design depends on what you're searching, and how you intend to search, and present the results.  You could most certainly use an entity that you pass to the search, as parameters that is derived from ICustomer, such as CustomerCriteria, however that would require that you have no control over advanced searching, such as AND, OR and so on, or it would be implicit.

    One thing that I would possibly do, if you're searching in memory would be to have a set of searching entities that are denormalised structure (so no cross referencing other classes is required, so something similar to a DTO)  If you want the searching to be available to Xml, or database, then you might consider a provider (strategy + factory patterns) to return the search structure for the particular source data implementation.  This however will only be an exercise with conceptual design, really if you wanted to do a proper search, to return in a proper amount of time, you'd be looking at a database implementation.

    The key to any efficient search is the schema - if you have to join mutliple pieces of information before returning the result, that is where the time is taken, for example recursing through the hierarchical structure of an Xml file.  Once you have this information denormalised, and quite probably cached, the search can be made relatively efficient.  The database is very good at this sort of thing, working on sets of data and so on, and with 100 master records would be really quick.  If you were searching upwards of a few million, you would probably have to start considering writing a trigger, or ETL to move data from the transactional structure into the search structure to allow results to be returned quickly, and the appication to scale.  That said, this could also possibly be a symptom of a poorly structure schema.

    Difficult to comment on Xml searching patterns directly, as the process would have little value in a commercial application, or would not be a very efficient way to work at least.  If you have data in denormalised entities though, you might want to look at all the different sorts of searching algorithms, such as binary searches, and the like.

    I hope this helps,

    Martin.

    MCSD, MCTS, MCPD. Please mark my post as helpful if you find the information good!
    Monday, July 6, 2009 10:25 PM
  • If you put the customers into a List<Customer> then it would be automatically searchable using Linq to Objects or lambdas.

                List<Customer> custList = new List<Customer>
                        {new Customer() 
                              { CustomerId = 1, 
                                FirstName="Bilbo",
                                LastName = "Baggins",
                                EmailAddress = "bb@hob.me"},
                        new Customer() 
                              { CustomerId = 2, 
                                FirstName="Frodo",
                                LastName = "Baggins",
                                EmailAddress = "fb@hob.me"},
                        new Customer() 
                              { CustomerId = 3, 
                                FirstName="Samwise",
                                LastName = "Gamgee",
                                EmailAddress = "sg@hob.me"},
                        new Customer() 
                              { CustomerId = 4, 
                                FirstName="Rosie",
                                LastName = "Cotton",
                                EmailAddress = "rc@hob.me"}};
    
                // LINQ
                var query = from c in custList
                            where c.CustomerId == 4
                            select c;
                if (query.Count() > 0)
                    foundCustomer = query.ToList()[0];
    
                // Lambda
                var foundCustomer2 = custList.FirstOrDefault(c =>
                                       c.CustomerId == 4);
    

    This example just finds by CustomerId. But it could find by any combination of criteria.
    www.insteptech.com ; msmvps.com/blogs/deborahk
    We are volunteers and ask only that if we are able to help you, that you mark our reply as your answer. THANKS!
    Monday, July 6, 2009 10:29 PM
  • Thanks for the replies - It's given me some stuff to think about.

    Loading all the clients into memory at once isn't an option; it's just too much data.   I had envisioned something along the lines of (very roughly):

    SearchCriteria searchCriteria = getSearchCritera();
    List<SearchResult> results = new List<SearchResult>();

    foreach (FileInfo file in Directory.GetFiles(pathToXMLFiles))
    {
        Customer customer = loadCustomer(file);
        results.Add(getSearchResults(customer, searchCriteria);
    }

    displayResults(results);


    But, I'm unsure of what SearchCritera would look like.  How do I represent "If The Name is 'John' and the address contains 'Smith Street' or 'Davie Street' and the Last Transaction is after June 8th" ? And then how would I apply that to a class that contains the data and get back a true/false/ and matching fields.?

    Just to be clear: This isn't a problem in some business application I'm designing I need a solution to.   It's more of a problem I've run across that I can solve in-elegantly and am interested in reading up on the principles that would allow solve it 'correctly'.   Searching google hasn't given me much.  (Funnily enough, seraching for 'search' doesn't work well since most web page have the word 'search' on them somewhere :-) )  I was, of course, hoping there was an 'accpeted' solution to this type of problem that someone would point me towards... Althought I am also quite willing to accept that it's a travelling-salesman type of problem, where the solution is more complex than one would expect.  In which case I can rest comfortable in the fact that my inelegant solution is good enough for me. :-)
    Tuesday, July 7, 2009 5:46 PM
  • I'm not sure how viable this is for you, but have you considered storing the Xml in SQL Server? I get nervous about storing lots of objects in memory "just" to support searching.
    http://pdkm.spaces.live.com/
    Thursday, July 9, 2009 10:43 AM
  • I'm with your pkr, if db is an option, use that, and if you need the information back as Xml, use "FOR XML AUTO" or "FOR XML ELEMENTS"...

    Also, like the idea of the above with LINQ to Entities, they work very well - done something very similar preivously with cached data, and sorted and manipulated data using that method.

    I guess the question is, what do you do with the Xml data once you have search and found what you are looking for?  Perhaps the answer to this question will help us come up with some useful help for you?  Perhaps you could also elaborate on the use of the application?  If you're worried about IP, say so, and make up a similar scenario (change the names to protect the innocent), and we can then do our best for you.  I think without this, and no extra feedback, it's going to be difficult to do the question justice.

    Thanks,

    Martin.
    MCSD, MCTS, MCPD. Please mark my post as helpful if you find the information good!
    Thursday, July 9, 2009 10:54 AM
  • I guess by avoiding giving a concrete example I was hoping to get a general answer to this type of problem instead of an answer for this specific one.

    But, here's basically what I'm dealing with:

    - I have a several 100 XML files containing customer records.  They contain information such as name, address, order history, etc.

    - I wrap this XML file in a class that gives access to the data and creates new data through some calculations (eg. Total purchase, last purchase date, distance from our location, etc..)

    - I've been asked to write a small utility that will allow the customer information to be searched by certain criteria.  Since the criteria may be in the calculated data -- and not just the raw data in the XML file -- a text search through the XML files won't work.  I'll need to serach the instance of the customer.

    - There are too many XML files to create an instance of every customer in memory and process them all at once.

    - Any customer that matches the search parameter is added to a datagrid with the customer ID and the search parameter that matched.   I'll probably need to expand on this later, but it's all that is really required. (The matching Customer ID, and the field that matched the serach criterea)

    - I'm not in a position to suggest they invest in a database for this 'small utility'.

    My inelegant solution is to have a form with a control for each of the possible search parameters and then an ugly list of 'if' statements.    The program then loads up each XML file indiviually, creates the instance of Customer, runs it though the if statements and adds it to the datagrid if it hits a match.

    This works, the program is functional, and the person I wrote it for is happy.  (I should note: This is just a fun project I'm doing for a friend as a favour.  He's got no expection that it'll actually work, but if it does it'll make his life a little easier.  I'm doing it for the experience... )

    But, I'm not happy with it.  It's ugly.  I don't have the slightest idea how I'd easily (or maybe the better word is 'correctly') add logical conditionals to the search critera without making my maze of 'if' statments exponetinally more complex.    I'm sure there's a better way, but haven't been able to find documentation that explains how to implement a class that allows it to be 'searched' (or create a wrapper for a class that allows it to be searched, or whatever the solution may be.. )

    I'm quite aware that the answer to these sorts of questions may be outside the scope of a forum.  I'm only hoping for someone to point me in the right direction with a link to a  webpage, wikipedia article or even a book that might get me started.   I'm mostly interested in learning the techniques to solve this sort of problem, since I can imagine it's something that may crop up again.

    Thanks again for everyone's input....



    Friday, July 10, 2009 10:06 PM
  • Just FYI ... SQL Server Express is free!
    www.insteptech.com ; msmvps.com/blogs/deborahk
    We are volunteers and ask only that if we are able to help you, that you mark our reply as your answer. THANKS!
    Friday, July 10, 2009 10:29 PM
  • Have you also considered using Index Server? I confess I don't have any experience of XML Content Filter but it might do the trick for you.
    http://pdkm.spaces.live.com/
    Saturday, July 11, 2009 8:04 AM