locked
fuzzy lookup maintain index incredibly slow RRS feed

  • Question

  • I am using SQL 2005 SP3. I have an ssis package that uses a fuzzy lookup to match records. The created lookup table is about 750k rows and the FLRef table is about 1.5 million rows. The performance of the lookup operation is pretty good. If I use the same index, I am able to quad thread it and refer all of the lookups to that single table, which seems to perform a lot better than if I make separate indexes.

    The index table itself seems to take about 2-4 minutes to create from scratch. However, the issue is when I do the maintain index. When I insert 20 rows, it takes about 11 seconds. Once a month, I need to do an insert that is about 15-25k rows. I tried to do that last night and it ran for 9 hours and then I stopped it. It seemed to insert all of the rows to the table, but I couldn't tell if the trigger was still trying to run or what.

    Does this sound normal? I know that with a trigger (which I know very little about), it is basically performing the operation one row at a time. It seems to mysteriously create tables that the sp procedure uses to update the eti. Is there anyway to batch run the sys.sp_FuzzyLookupTableMaintenanceInvoke procedure so that it does all of the inserted rows at once? Could this be if I deleted the trigger manually before, and then recreated the index table?

    Thursday, November 11, 2010 2:12 PM

Answers

  • Drop the index (you can also disable it which I'd prefer), insert data, recreate or re-enable the index.
    Arthur My Blog
    By: TwitterButtons.com
    • Marked as answer by mcollins1264 Thursday, November 11, 2010 4:40 PM
    Thursday, November 11, 2010 4:21 PM

All replies

  • Is the issue with slow inserts or long maintenance? What is your goal?
    Arthur My Blog
    By: TwitterButtons.com
    Thursday, November 11, 2010 4:09 PM
  • Slow inserts. This runs as part of a procedure that inserts a months worth of data right before the fuzzy match is run. I only need to insert 15-25k rows at a time. I tried batching it with 200-500 rows but that was slow as well. It seems wierd to me because the 750k table is very fast building the index from scratch. It wouldn't be an issue if I wasnt trying to multithread it, which means that all of the fuzzies should ideally hit the same index (or so I have found), and you can't specify a new index with the existing fuzzies, because even if you just created it, it says that it cant find flref_234085n2 and such.So, maintaining is the idea way because then I don't have to rebuild the index and package every time I want to run it.

    I'm working on a work around now that will parse out the rows from the input and the fuzzy matching by one of the exact match specifications (this might even be faster since i will ultimately shrink the fuzzy match indicies). But, I'd really like the insert / maintenance to work though so I don't have to set up special views every time I want to do this.

    Thursday, November 11, 2010 4:17 PM
  • Drop the index (you can also disable it which I'd prefer), insert data, recreate or re-enable the index.
    Arthur My Blog
    By: TwitterButtons.com
    • Marked as answer by mcollins1264 Thursday, November 11, 2010 4:40 PM
    Thursday, November 11, 2010 4:21 PM
  • Is this the FuzzyLookupTableMaintenanceRefTableCopyIndex that is on the FLRef table?

    Are you saying that if I insert into the table with the trigger (and I am assuming the trigger is disabled at this point) and rebuild the above (or whichever) index, it will update the match index automatically?

    Thursday, November 11, 2010 4:27 PM
  • Or wait, do you mean keep the trigger active while the index is disabled (and it wont rebuild every time), and then enable it after the doing the insert as normal?
    Thursday, November 11, 2010 4:36 PM
  • Exactly this.
    Arthur My Blog
    By: TwitterButtons.com
    • Marked as answer by mcollins1264 Thursday, November 11, 2010 4:40 PM
    • Unmarked as answer by mcollins1264 Thursday, November 11, 2010 4:40 PM
    Thursday, November 11, 2010 4:37 PM
  • Thanks, i will give that a shot--
    Thursday, November 11, 2010 4:40 PM
  • Arthur, are you absolutely sure, this is the fuzzy index used and maintained by the fuzzy lookup transformation , which is not really an index at all but several tables?

    mcollins1264, I think the simplest option would be to not use the maintain index option at all, and just re-create the fuzzy index followng your bulk load. If you really do want to keep the maintain option on for smaller inserts and updates then add a clean-up step to drop the objects before the bulk load. This also assumes that the bulk load can be run during a period of downtime as you won't have the fuzzy capability without the index.


    http://www.sqlis.com | http://www.konesans.com
    Thursday, November 11, 2010 4:48 PM
  • I think he means the FuzzyLookupTableMaintenanceRefTableCopyIndex, which is on one of those tables, or are you saying that is a separate set of tables in itself?


    My proc is in shambles right now, so I wont be able to test this for a little while, but I think he is saying that if that rebuilding index , of which is presumably the slow part of the trigger, will not occur if the index is disabled or gone. Then, if it is re-enabled / rebuilt, it will say "oh here are a bunch of new rows" rather than doing that on every insert.

    I am making a bunch of assumptions, so please correct me if I am wrong. Again I wont be able to test this for some time, but I am definately interested as this will be key on a bunch of these that I built recently.

    Thursday, November 11, 2010 5:01 PM