none
Guidance on searching through many large collections.

    Question

  • I have a parser I have built that uses a specific callback to convert input based on a string that accompanies the input object.

    There are several thousand potential strings, I group those in HashSet<string> and then naively loop through them all until I know which callback to apply:
    if (mapHashSetGroupA.Countains(inputString))
    {
        CallBackA(inputObject);
    }
    else if (mapHashSetGroupB.Countains(inputString))
    {
        CallBackB(inputObject);
    }
    else if (mapHashSetGroupC.Countains(inputString))
    {
        CallBackC(inputObject);
    }
    ...
    else
    {
        throw new ...
    }
    Does a faster approach exist than simply checking each collection one by one until a match is found? Possibly a better data structure to group the many sets of possible matches that dictate which calback to use?

    Possibly a single dictionary with all potential input strings as keys and the reference to the callbacks as keys? That incurs only a single hash lookup?

    • Edited by Ritmo2k Wednesday, April 26, 2017 1:30 AM
    Wednesday, April 26, 2017 1:27 AM

Answers

  • The Contains function should be O(1), so it is as fast as it goes (theoretically). Nonetheless, you might really want to switch to one structure to hold all information. There are 2 reasons for that:

    1. it will be slightly faster, since you don't check unnecessary HashSets until you find the right one, so you save a couple of operations (but this is negligible)
    2. it will be easier to maintain - at the moment you have to manually type each check with an if statement, this is not really extensible and prone to errors. Having everything in one structure (dictionary works) which is configured at a central place will be easier to maintain.
    • Marked as answer by Ritmo2k Wednesday, April 26, 2017 8:26 AM
    Wednesday, April 26, 2017 7:01 AM

All replies

  • The Contains function should be O(1), so it is as fast as it goes (theoretically). Nonetheless, you might really want to switch to one structure to hold all information. There are 2 reasons for that:

    1. it will be slightly faster, since you don't check unnecessary HashSets until you find the right one, so you save a couple of operations (but this is negligible)
    2. it will be easier to maintain - at the moment you have to manually type each check with an if statement, this is not really extensible and prone to errors. Having everything in one structure (dictionary works) which is configured at a central place will be easier to maintain.
    • Marked as answer by Ritmo2k Wednesday, April 26, 2017 8:26 AM
    Wednesday, April 26, 2017 7:01 AM
  • Hi Ritmo2k,

    Thank you for posting here.

    For your question, you could try the Switch case.

    I make a simple example for your reference.

       static void Main(string[] args)
            {
                string output = string.Empty;
                string caseSwitch = Console.ReadLine();
                switch (caseSwitch)
                {
                    case "hello":
                    CallBackA(output);
                    break;
                    case "hi":
                    CallBackA(output);
                    break;
                }
            }
            public static string CallBackA(string output)
            {
                return "CallBackA";
            }
            public static string CallBackB(string output)
            {
                return "CallBackB";
            }

    I hope this would be helpful.

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, April 26, 2017 7:34 AM
    Moderator
  • Hi,
    The potential cases are many many thousands, I have migrated to a single dictionary and some low tech benchmarking showed this as an improvement.

    Thanks everyone.
    Wednesday, April 26, 2017 8:28 AM