SQL Server Data Mining: Plug-In Algorithms performance

Răspuns propus SQL Server Data Mining: Plug-In Algorithms performance

  • 28 noiembrie 2011 14:39
     
     

    Hi,

    I have one questions. I have done some plugin for mining sequential patterns.

    When I start a plugin algorithm for the first time (PC reboot, server restart, ...)  data are loaded very fast (i mean executin all: ProcessCase),

    but when I run algortihm for the next time data loading is very slow (10-times and worth).

    What Can be problem?

    Thanks for advice.


    Edit: I am using managed plugins version.
    • Editat de tisonet 28 noiembrie 2011 14:47
    •  

Toate mesajele

  • 28 noiembrie 2011 18:47
     
     
    If any relevant code or query is exist..could you please share it..?
    Think more deeply of performance terms
  • 28 noiembrie 2011 19:51
     
      Are cod

    Thats not so easy, because I had to post all source code, and i dont have enought law to share the source code.

    But I am not sure if this problem can do a plugin.

    I think that sql anal. server for the first time transfer data from db quickly, but for the next time slowly.

    After server transfer all data, algorithm takes the same time for every mining.

    Mayby I have to close some resources after finish mining, but tutorial for creating plugins didnt say anything about it.

     

    The example shows my loading data from db.

        public override void ProcessCase(long caseID, MiningCase inputCase)
          {
            uint attrsCount = Algorithm.AttributeSet.GetAttributeCount();
            uint attrIndex = 1;
    
            double seqKey = 0.0;
            uint indexValue = 0;
    
            // Items associated with timestamps.
            var itemsets = new Dictionary<double, HashSet<uint>>();
    
            // increment the current value of the trace notification
            Algorithm._trainingProgress.Current++;
    
    
            bool hasItem = inputCase.MoveFirst();
    
            while (hasItem)
            {
              // Gets timestamp of an item.
              if (inputCase.Attribute == Algorithm.SequenceKeyAttribute)
              {
                seqKey = inputCase.DoubleValue;
              }
              // Gets an item id. 
              else if (inputCase.Attribute == Algorithm.NestedKeyAttribute)
              {
                indexValue = inputCase.IndexValue;
              }
    
              // If a new item started, saves a previuos item.
              if (attrIndex == attrsCount)
              {
                // Checks if an item is a frequent item.
                if (_frequentItems.Contains(indexValue))
                {
                  attrIndex = 1;
    
                  HashSet<uint> itemset;
                  bool exist = itemsets.TryGetValue(seqKey, out itemset);
    
                  if (!exist)
                  {
                    itemset = new HashSet<uint>();
                    itemsets.Add(seqKey, itemset);
                  }
    
                  itemset.Add(indexValue);
                }
              }
              else
              {
                attrIndex++;
              }
    
              hasItem = inputCase.MoveNext();
            }
    
    
            List<List<uint>> itemsetsList = new List<List<uint>>(itemsets.Count);
            
            // if distinct items only is set true, this hashset saves already added items.
            HashSet<uint> alreadyAddedItems = (_distinctItemsOnly) ? new HashSet<uint>() : null;
    
            foreach (var itemset in itemsets.OrderBy(i=>i.Key))
            {
              List<uint> itemsList = new List<uint>(itemset.Value.Count);
    
              foreach (var item in itemset.Value)
              {
                // Adds item to an itemset iff:
                // distinct items only is false, or
                // this is the first occurence of item in sequence.
                if(!_distinctItemsOnly || alreadyAddedItems.Add(item))
                {
                  itemsList.Add(item);
                }
              }
    
              if (itemsList.Count > 0)
              {
                itemsetsList.Add(itemsList);
              }
            }
    
            if (itemsetsList.Count > 0)
            {
              _sequences.Add(new Sequence(itemsetsList));
            }
    
            if (caseID % 100 == 0)
            {
              // Fire the trace every 100 cases, to avoid  performance impact.
              Algorithm._trainingProgress.Progress();
    
              // Make sure that the processing was not canceled.
              Algorithm.CheckCancelled();
            }
          }
    


  • 30 noiembrie 2011 18:29
    Moderator
     
     

    Tisonet, are you releasing resources after execurtion of your plugin is completed?

    This could explain why it is working fine first time. I would also check if there are no infinite loops in the code.

     

    HTH, Vlad.

  • 30 noiembrie 2011 21:56
    Moderator
     
     

    Hi

    can you please verify the memory pressure on the system during training? Ideally, you would check the amount of memory used by the analysis services process (msmdsrv.exe) before and after each model training session

    Regarding resource usage - are you building multiple models in the same mining structure?

     

     

     


    bogdan crivat / http://www.bogdancrivat.net/dm
  • 1 decembrie 2011 19:05
     
     

    I checked the memory and nothning special:

    Before first run: 33,5 MB

    After first run: 82 MB

    Running time: 2:49 m.

     

    Before second run: 83 M

    After second run: 86 MB

    Running time: 33:53 m.

     

    I did some profiling throw VS and find that 99.99% time takes clr.dll, this function:

    Microsoft.SqlServer.DataMining.PluginAlgorithms.InternalCaseProcessor.ExportedProcessCase(int32 modopt(System.Runtime.CompilerServices.IsLong),uint32 modopt(System.Runtime.CompilerServices.IsLong),uint32 modopt(System.Runtime.CompilerServices.IsLong)*); 

     

    I dont open any resources. Just log file, which iam closing.

  • 1 decembrie 2011 20:34
    Moderator
     
     

    Are you using any static variables?

    Memory Management
    Memory objects—such as strings, variants, and arrays—must be allocated and freed using per-algorithm or per-request service provider interfaces passed to plug-in algorithms by the Data Mining Engine. This allows the Data Mining Engine to efficiently manage memory resources and balance them across multiple requests. As a plug-in algorithm developer, this is another complex area that you no longer need to spend development effort on.

    Please, check these links:

    http://msdn.microsoft.com/en-US/library/aa964125(v=SQL.90).aspx

    http://msdn.microsoft.com/en-us/magazine/cc163377.aspx

    Are you getting any errors during second run?

     

  • 2 decembrie 2011 10:13
     
     

    No erros, after every run I get the same results, but different running time.

    I dont know what this means:

    Memory objects—such as strings, variants, and arrays—must be allocated and freed using per-algorithm or per-request service provider interfaces passed to plug-in algorithms by the Data Mining Engine

    In managed plugin Iam using C# and it has garbage collector, so how I can freed memory (of course I know about IDisposable). 

    To collections: When I implemented plugins a always did "big thinking" to choose correct collection type (with the best complexity).

     

    When plugin finished work, when AS (anal. service )  kills the instance of the AlgorithmBase class?

    Because iam not saving result into writer (SaveContent, LoadContent methods) . I just have result in memory (no more than 50MB).

    For view the results, i am using classic MS VIEWER. 


  • 7 decembrie 2011 22:19
    Moderator
     
     

    Tisonet, you might try to utilize "using" keyword where possible and avoid use of static variables.

    Thanks, Vlad. 

  • 9 decembrie 2011 01:17
    Moderator
     
     Răspuns propus

    Hello Tisonet,

    Regarding instance of Algorithm Base class: it should be released in your code along with all memory objects created with the plugin.

    The Data Mining Engine in SQL Server 2005 Analysis Services communicates with plug-in algorithms via a set of COM interfaces that are available in a public header file.

    Use Data Mining Engine interface to release instance of your Algoritm Base class.

    Best regards, Vlad.