SQL Server Data Mining: Plug-In Algorithms performance
-
28 noiembrie 2011 14:39
Hi,
I have one questions. I have done some plugin for mining sequential patterns.
When I start a plugin algorithm for the first time (PC reboot, server restart, ...) data are loaded very fast (i mean executin all: ProcessCase),
but when I run algortihm for the next time data loading is very slow (10-times and worth).
What Can be problem?
Thanks for advice.
Edit: I am using managed plugins version.- Editat de tisonet 28 noiembrie 2011 14:47
Toate mesajele
-
28 noiembrie 2011 18:47If any relevant code or query is exist..could you please share it..?
Think more deeply of performance terms -
28 noiembrie 2011 19:51
Thats not so easy, because I had to post all source code, and i dont have enought law to share the source code.
But I am not sure if this problem can do a plugin.
I think that sql anal. server for the first time transfer data from db quickly, but for the next time slowly.
After server transfer all data, algorithm takes the same time for every mining.
Mayby I have to close some resources after finish mining, but tutorial for creating plugins didnt say anything about it.
The example shows my loading data from db.
public override void ProcessCase(long caseID, MiningCase inputCase) { uint attrsCount = Algorithm.AttributeSet.GetAttributeCount(); uint attrIndex = 1; double seqKey = 0.0; uint indexValue = 0; // Items associated with timestamps. var itemsets = new Dictionary<double, HashSet<uint>>(); // increment the current value of the trace notification Algorithm._trainingProgress.Current++; bool hasItem = inputCase.MoveFirst(); while (hasItem) { // Gets timestamp of an item. if (inputCase.Attribute == Algorithm.SequenceKeyAttribute) { seqKey = inputCase.DoubleValue; } // Gets an item id. else if (inputCase.Attribute == Algorithm.NestedKeyAttribute) { indexValue = inputCase.IndexValue; } // If a new item started, saves a previuos item. if (attrIndex == attrsCount) { // Checks if an item is a frequent item. if (_frequentItems.Contains(indexValue)) { attrIndex = 1; HashSet<uint> itemset; bool exist = itemsets.TryGetValue(seqKey, out itemset); if (!exist) { itemset = new HashSet<uint>(); itemsets.Add(seqKey, itemset); } itemset.Add(indexValue); } } else { attrIndex++; } hasItem = inputCase.MoveNext(); } List<List<uint>> itemsetsList = new List<List<uint>>(itemsets.Count); // if distinct items only is set true, this hashset saves already added items. HashSet<uint> alreadyAddedItems = (_distinctItemsOnly) ? new HashSet<uint>() : null; foreach (var itemset in itemsets.OrderBy(i=>i.Key)) { List<uint> itemsList = new List<uint>(itemset.Value.Count); foreach (var item in itemset.Value) { // Adds item to an itemset iff: // distinct items only is false, or // this is the first occurence of item in sequence. if(!_distinctItemsOnly || alreadyAddedItems.Add(item)) { itemsList.Add(item); } } if (itemsList.Count > 0) { itemsetsList.Add(itemsList); } } if (itemsetsList.Count > 0) { _sequences.Add(new Sequence(itemsetsList)); } if (caseID % 100 == 0) { // Fire the trace every 100 cases, to avoid performance impact. Algorithm._trainingProgress.Progress(); // Make sure that the processing was not canceled. Algorithm.CheckCancelled(); } }
-
30 noiembrie 2011 18:29Moderator
Tisonet, are you releasing resources after execurtion of your plugin is completed?
This could explain why it is working fine first time. I would also check if there are no infinite loops in the code.
HTH, Vlad.
-
30 noiembrie 2011 21:56Moderator
Hi
can you please verify the memory pressure on the system during training? Ideally, you would check the amount of memory used by the analysis services process (msmdsrv.exe) before and after each model training session
Regarding resource usage - are you building multiple models in the same mining structure?
bogdan crivat / http://www.bogdancrivat.net/dm -
1 decembrie 2011 19:05
I checked the memory and nothning special:
Before first run: 33,5 MB
After first run: 82 MB
Running time: 2:49 m.
Before second run: 83 M
After second run: 86 MB
Running time: 33:53 m.
I did some profiling throw VS and find that 99.99% time takes clr.dll, this function:
Microsoft.SqlServer.DataMining.PluginAlgorithms.InternalCaseProcessor.ExportedProcessCase(int32 modopt(System.Runtime.CompilerServices.IsLong),uint32 modopt(System.Runtime.CompilerServices.IsLong),uint32 modopt(System.Runtime.CompilerServices.IsLong)*);
I dont open any resources. Just log file, which iam closing.
-
1 decembrie 2011 20:34Moderator
Are you using any static variables?
Memory Management
Memory objects—such as strings, variants, and arrays—must be allocated and freed using per-algorithm or per-request service provider interfaces passed to plug-in algorithms by the Data Mining Engine. This allows the Data Mining Engine to efficiently manage memory resources and balance them across multiple requests. As a plug-in algorithm developer, this is another complex area that you no longer need to spend development effort on.Please, check these links:
http://msdn.microsoft.com/en-US/library/aa964125(v=SQL.90).aspx
http://msdn.microsoft.com/en-us/magazine/cc163377.aspx
Are you getting any errors during second run?
-
2 decembrie 2011 10:13
No erros, after every run I get the same results, but different running time.
I dont know what this means:
Memory objects—such as strings, variants, and arrays—must be allocated and freed using per-algorithm or per-request service provider interfaces passed to plug-in algorithms by the Data Mining Engine
In managed plugin Iam using C# and it has garbage collector, so how I can freed memory (of course I know about IDisposable).
To collections: When I implemented plugins a always did "big thinking" to choose correct collection type (with the best complexity).
When plugin finished work, when AS (anal. service ) kills the instance of the AlgorithmBase class?
Because iam not saving result into writer (SaveContent, LoadContent methods) . I just have result in memory (no more than 50MB).
For view the results, i am using classic MS VIEWER.
- Editat de tisonet 2 decembrie 2011 10:13
- Propus ca răspuns de Vlad Ts - MSFTMicrosoft Employee, Moderator 9 decembrie 2011 01:17
- Anulare propunere ca răspuns de Vlad Ts - MSFTMicrosoft Employee, Moderator 9 decembrie 2011 01:18
-
7 decembrie 2011 22:19Moderator
Tisonet, you might try to utilize "using" keyword where possible and avoid use of static variables.
Thanks, Vlad.
-
9 decembrie 2011 01:17Moderator
Hello Tisonet,
Regarding instance of Algorithm Base class: it should be released in your code along with all memory objects created with the plugin.
The Data Mining Engine in SQL Server 2005 Analysis Services communicates with plug-in algorithms via a set of COM interfaces that are available in a public header file.
Use Data Mining Engine interface to release instance of your Algoritm Base class.
Best regards, Vlad.
- Propus ca răspuns de Vlad Ts - MSFTMicrosoft Employee, Moderator 9 decembrie 2011 01:18