Data Mining ForumAll questions and discussions related to data-mining in SQL Server© 2009 Microsoft Corporation. All rights reserved.Sat, 28 Nov 2009 12:30:17 Z7a905e4f-2b22-40a8-abce-aaa14ab7a09ahttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/02ad4078-f42c-4f5e-89c1-d50e310e9817http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/02ad4078-f42c-4f5e-89c1-d50e310e9817Michael_PBhttp://social.msdn.microsoft.com/Profile/en-US/?user=Michael_PBFeature Selection to recommend algorithm attributesHi,<br/><br/>I would like to be able to have a user select a database table, an algorithm such as decision trees and then select the column they want to predict (eg buy a bike).<br/>Rather than run the algorithm straight away, I would like to list what attributes significantly affect the prediction. Then let the user select whether to use these columns and any other they may like to incorporate in the mining model.<br/><br/>Many Thanks,<br/>Michael<br/><br/> Tue, 24 Nov 2009 11:01:57 Z2009-11-28T12:30:17Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/bf049f73-510c-42f5-99ac-f1443ec19140http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/bf049f73-510c-42f5-99ac-f1443ec19140mikealbainyhttp://social.msdn.microsoft.com/Profile/en-US/?user=mikealbainyTesting/Training split - running prediction join on test setIf you use the new train/test split feature in SQL 2008, how can you specify running the prediction join on the model test set only (in order to run an evaluation of the modeling results)?  Is there an attribute in the mining data set which allows you to &quot;filter&quot; these cases for the prediction join?  I envision  something along of the lines of &quot;WHERE IsTestCase=1&quot; in the WHERE clause of the DMX query.<br/> <br/> I assume that the intended functionality of the new built-in cross-fold validation is to perform a similar task.  But, for some reason I cannot get it to work.  I have 2 DT models using the same mining structure (only diff is that one column is ignored in one of the models) and they have a predict only value in a nested table.  In the &quot;Input Selection&quot; tab the &quot;predictable column name&quot; drop-down never displays any values after selecting the model(s) to compare.<br/> <br/> Any suggestions on either approach?Sat, 28 Nov 2009 09:59:55 Z2009-11-28T09:59:55Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/805933c8-494d-437c-a85b-c6dca37e8735http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/805933c8-494d-437c-a85b-c6dca37e8735light_wthttp://social.msdn.microsoft.com/Profile/en-US/?user=light_wtinput column has states of 1, 2, 3, 4 ... but, Mining models show 1 or missingone of the input column, myCount_Assigned, in the structure is based on view.  myCount_Assigned is result of count(col_name) that column in the view.  It has values of 1, 2, 3, 4, to 10.  <br/> <br/> The properties of that column's content is Discrete with type of Long . <br/> <br/> THe Models I use are Decision Tree and Clustering.<br/> <br/> After processing the data, myCount_Assigned has only 1 or missing states.  what happen to the 2, 3, 4 and so on?  <br/> <br/> THanks.Fri, 27 Nov 2009 19:22:16 Z2009-11-27T19:22:17Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/a602814d-53f3-4776-bcaf-37ee28ea417dhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/a602814d-53f3-4776-bcaf-37ee28ea417dlight_wthttp://social.msdn.microsoft.com/Profile/en-US/?user=light_wtsql 2008 'Decision Trees found no appropriate regressors for model' Hi,<br/> <br/> I've run into that message.  What does it mean and is there something I need to do to fix it?<br/> <br/> I've read this older post : <br/> http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/94f52e49-f862-4814-8c5e-97783264ff3e<br/> <br/> where in BIDS08 (tab, properties, etc?) to force that &quot; you can 'force' this to happen by specifying the column name as the value of the FORCE_REGRESSOR parameter of the algorithm&quot;<br/>Fri, 27 Nov 2009 18:39:34 Z2009-11-27T18:39:35Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/3b76715a-b15a-457a-920c-c402d3398677http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/3b76715a-b15a-457a-920c-c402d3398677v.cupalhttp://social.msdn.microsoft.com/Profile/en-US/?user=v.cupalPlug-in algorithm shell complilation errorI'm trying to create plug-in algorithm using tutorial from Max Chickering and Raman Iyer. I follow steps very carefully and as far as I know my code is exactly the same as Completed shell solution in source files of tutorial. But when I try to compile factory.cpp file I get this error: <br>Error    1    error C2259: 'ATL::CComObject&lt;Base&gt;' : cannot instantiate abstract class    e:\development\ms visual studio 2005\vc\atlmfc\include\atlcom.h    1799   <br><br>When I compile completed shell solution from source files it's OK. Do you have any ideas, where the problem could be?<br><br>Thank you<br><br>Wed, 29 Aug 2007 15:27:17 Z2009-11-25T15:51:08Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/e8875628-5fb8-43c5-9ef1-e4548720eccehttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/e8875628-5fb8-43c5-9ef1-e4548720eccegsc1ugshttp://social.msdn.microsoft.com/Profile/en-US/?user=gsc1ugsquery to list second highest<p>Hi</p> <p>I have this data</p> <p>[code]<br/>CustBid idProduct bidAmount bidDate<br/>1 272 165.00 2009-11-20 09:05:31.640<br/>1 272 155.00 2009-11-20 09:05:25.480<br/>4 272 95.00 2009-11-19 12:40:44.973<br/>4 272 85.00 2009-11-19 12:40:40.377<br/>1 272 85.00 2009-11-12 11:20:52.400<br/>1 272 85.00 2009-11-12 11:20:52.400<br/>4 272 75.00 2009-11-12 11:11:02.080<br/>1 272 65.00 2009-11-12 11:20:20.170<br/>1 272 45.00 2009-11-12 11:08:02.407<br/>1 272 25.00 2009-11-12 11:05:06.663<br/>[/code]</p> <p>and I want to be able to list the second highest bid that is not the same user so what i would like to display is 95 because bidder 1 has two higher so i dont want those proxy bids visible, what query could i use? have this so far</p> <p>[code]<br/>Select bidhistory.idCustomerBid as CustBid, bidhistory.idProduct, bidhistory.bidAmount, bidhistory.bidDate From bidhistory where (bidhistory.idProduct = 272) AND (idCustomerBid &lt;&gt; 2) order by bidAmount desc <br/>[/code]</p>Tue, 24 Nov 2009 09:54:27 Z2009-11-24T22:14:43Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/5e8f17b2-20a3-49ed-a9c4-48655008f1dbhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/5e8f17b2-20a3-49ed-a9c4-48655008f1dbTxwalkerhttp://social.msdn.microsoft.com/Profile/en-US/?user=TxwalkerSQL 2005 PIVOT with CASE<p>I am having a really difficult time getting the result I need out of the following query.</p> <p>   </p> <div style="color:black;background-color:white"> <pre><span style="color:blue">SELECT</span> <span style="color:magenta">SUM</span>(HEALTH_MTTR) <span style="color:blue">AS</span> MTTR, <span style="color:magenta">SUM</span>(HEALTH_OTR) <span style="color:blue">AS</span> OTR, <span style="color:magenta">SUM</span>(HEALTH_REPEAT) <span style="color:blue">AS</span> REPEAT, <span style="color:magenta">SUM</span>(HEALTH_CHRONIC) <span style="color:blue">AS</span> CHRONIC, <span style="color:magenta">SUM</span>(HEALTH_TOTAL) <span style="color:blue">AS</span> TOTAL, (<span style="color:magenta">CAST</span>(<span style="color:magenta">CAST</span>(<span style="color:magenta">YEAR</span>([DATE_RESOLVED_FOR_CLOSURE]) <span style="color:blue">AS</span> <span style="color:blue">VARCHAR</span>(4)) + <span style="color:#a31515">'-'</span> + <span style="color:blue">Right</span>(<span style="color:#a31515">'00'</span> + <span style="color:magenta">CAST</span>(<span style="color:magenta">MONTH</span>([DATE_RESOLVED_FOR_CLOSURE]) <span style="color:blue">AS</span> <span style="color:blue">VARCHAR</span>(2)), 2) <span style="color:blue">AS</span> <span style="color:blue">VARCHAR</span>(7))) <span style="color:blue">AS</span> YearMonth <span style="color:blue">FROM</span> TRT_remedy_tickets <span style="color:blue">WHERE</span> COMPID = <span style="color:#a31515">'138'</span> <span style="color:blue">Group</span> <span style="color:blue">By</span> (<span style="color:magenta">CAST</span>(<span style="color:magenta">CAST</span>(<span style="color:magenta">YEAR</span>([DATE_RESOLVED_FOR_CLOSURE]) <span style="color:blue">AS</span> <span style="color:blue">VARCHAR</span>(4)) + <span style="color:#a31515">'-'</span> + <span style="color:blue">Right</span>(<span style="color:#a31515">'00'</span> + <span style="color:magenta">CAST</span>(<span style="color:magenta">MONTH</span>([DATE_RESOLVED_FOR_CLOSURE]) <span style="color:blue">AS</span> <span style="color:blue">VARCHAR</span>(2)), 2) <span style="color:blue">AS</span> <span style="color:blue">VARCHAR</span>(7))) </pre> </div> <p>This Returns columns MTTR, OTR, REPEAT, CHRONIC, TOTAL, YearMonth  with the results in the rows.</p> <p>What I am trying to pull of is to have each of the retrieved YearMonth Results displayed as a colum and then the rows should be MTTR, OTR, REPEATE, CHRONIC, TOTAL</p> <p>Now from what I've read.  Since the YearMonth Values are going to be dynamically generated I either have to use Dynamic SQL or Case statements.</p> <p>I would prefer not to have to store the query on the database as I have limited access to creating stored procedures.</p>Mon, 23 Nov 2009 20:37:55 Z2009-11-24T21:32:17Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/6f47dd06-1cb7-47f3-aa3d-ace1abb52b9ehttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/6f47dd06-1cb7-47f3-aa3d-ace1abb52b9eparvijhttp://social.msdn.microsoft.com/Profile/en-US/?user=parvijdata mining performancehope you are doing well.<br/> <br/> I am researching on data mining as student in the master of software engineering.<br/> <br/> Regarding to your proficiency,respectfuly I request to advise me any of questions as follow:<br/> I'm researching in determination data mining goals. right now, I'm looking for criterions of <br/> <br/> elections for data mining goal from several possible one.for that porpose i classified it to <br/> <br/> 4 issue as bellow:<br/> 1. cost<br/> 2. speed<br/> 3. benefit(gain)<br/> 4. risk (but i discover this case)<br/> unfortunately i couldn't find any reference for this subject.<br/> would you mind if there is just inform me about?<br/> I mean some references about:<br/> - which kind of cost and speed impediments we facing with data mining project<br/> - what is customer and data mining project manager satisfied criterion<br/> <br/> <br/> Thanks in advance for your time and consideration.<br/> <br/> awaiting for your kind reply I remain.<br/>Mon, 23 Nov 2009 08:29:50 Z2009-11-24T15:32:43Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/ebb6d114-e8e1-4042-a6f3-43b7e9af7c77http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/ebb6d114-e8e1-4042-a6f3-43b7e9af7c77matias_Castrohttp://social.msdn.microsoft.com/Profile/en-US/?user=matias_CastroDecision Trees algorithm in detailI've been looking for a long time now for a REALLY detailed description of the Microsoft Decision Trees algorithm. I am particularly interested in how the algorithm splits nodes for continuous input attributes. <br/> <br/> I have been addressed to pages in msdn help pages where general descriptions are found but I would like to know how things are done a bit more formally. <br/> <br/> Is there any formal paper from which the algorithm was taken? or something like a paper describing the algorithm?<br/> <br/> Thanks a lot!Thu, 12 Nov 2009 16:11:28 Z2009-11-23T23:21:43Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/9b700dc0-aa69-43f3-b413-efbe7b89655ehttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/9b700dc0-aa69-43f3-b413-efbe7b89655eLightstonehttp://social.msdn.microsoft.com/Profile/en-US/?user=LightstoneCounting up multiple y/n fields in a query statementI have around 6 y/n fields in a table and I'm trying to query it so that I can count the instances of Yes for each of them.  I want to use a single query statement but am unsure of how to accomplish this.  The reason for the single statement is I want to use a pie chart in ssrs to display the results and you can only use a single data source.  Thank you.Mon, 23 Nov 2009 05:48:51 Z2009-11-23T16:42:25Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/49aab401-3931-4b05-9935-e57af307cd20http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/49aab401-3931-4b05-9935-e57af307cd20NewMinehttp://social.msdn.microsoft.com/Profile/en-US/?user=NewMineHow to find underlying patternI have a data of the bank customers getting a particular loan. If I want to see their pattern (ie - what features are very similar in them) what technique do I need to use?<br/> <br/> I cannot use prediction as it is a single class problem.<br/> <br/> Clustering will just cluster the similar customers. But how would I find a underlying pattern in my customers?Fri, 13 Nov 2009 04:32:31 Z2009-11-23T15:08:47Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/9fddc9dc-b14a-486d-be3b-940849e2f858http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/9fddc9dc-b14a-486d-be3b-940849e2f858dessertshttp://social.msdn.microsoft.com/Profile/en-US/?user=dessertsTraining data/ overfitting &lt;!-- @page { margin: 0.79in } P { margin-bottom: 0.08in } --&gt; <p style="margin-bottom:0in"><strong>i read that when a model is fit to training data, zero error with those data is not necessarily good. why ? due to overfitting ? </strong></p>Sun, 22 Nov 2009 10:57:53 Z2009-11-23T08:54:26Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/fc338468-638a-46aa-9869-9745a8896826http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/fc338468-638a-46aa-9869-9745a8896826Ion Fhttp://social.msdn.microsoft.com/Profile/en-US/?user=Ion%20FFinding patterns in communicationsHi all,<br/> <br/> <br/> I'm new to data mining and I was looking at sql server data mining features to see if it can help me with identifying patterns in a sql server database.<br/> The database models communications between people (phone, email, etc).<br/> I need a tool to identify various patterns in this communications.<br/> The kind of things I'm looking for is weather there are repetitive sequences of calls, for instance every time Joe Pothead calls Jack the Dealer, Jack calls Sam the Supplier within 30 minutes or so. There are variations in terms of I might be looking for more complicated sequences, maybe the direction of the call is important or not, if it's a call or an sms etc.<br/> <br/> The data model looks roughly like this:<br/> <br/> There is a Communications table (like transactions for instance) whith a surrogate primary key, a communication start and end time, a communication type (phone, email, sms etc) and some other irrelevant attributes.<br/> There is also a table describing each communication (sort of transaction details, as there may be more people involved in a communication, like in case of emails or conference calls, etc) which lists every person involved in that particular communication session, along with information like email, phone number it called from, wether it initiated the call or it was called (or cc-ed) etc.<br/> I needed some pointers if sql server can answer this sort of questions, maybe what kind of alogorithm would be appropriate, if there is a good analogy with other real life models which might be better described in articles and samples, where is a good point to start with.<br/> <br/> <br/> Much appreciatedSat, 14 Nov 2009 00:02:45 Z2009-11-21T08:26:02Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/862bbb07-76b2-4ef2-b63d-7bccf44bfa24http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/862bbb07-76b2-4ef2-b63d-7bccf44bfa24uchevthttp://social.msdn.microsoft.com/Profile/en-US/?user=uchevtSQL procedure issueHi All,<br/><br/>I have two tables. Table 1 is called DATA, which has several columns, most important of them are DATA_ID and METRIC_ID. Table 2 is called PARAMETERS, which also has several columns, most important of them are DATA_ID, PARAMETER_ID, and Client_Platform. I want to traverse through every record in the DATA table, and compare it with its respective record in the PARAMETERS table, matching their DATA_IDs. Once I find a match, I would like to update the PARAMETER table's Client_Platform column. This task is ran until the end of the DATA table. Below is you will find my algorithm that should accomplish the above description, however, it fails to do that. Please let me know where I have made a mistake.<br/><br/><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff"><font size=2 color="#0000ff"><font size=2 color="#0000ff"> <p>ALTER</p> </font></font></span><font size=2 color="#0000ff"> <p> </p> </font></span> <p><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">PROCEDURE</span></span><span style="font-size:x-small"> [dbo]</span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">.</span></span><span style="font-size:x-small">[UPDATE_PLATFORM] </span></p> <span style="font-size:x-small;color:#008000"><span style="font-size:x-small;color:#008000"> <p>-- Add the parameters for the stored procedure here</p> </span></span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff"> <p>AS</p> <p>BEGIN</p> </span></span><span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#008000"><span style="font-size:x-small;color:#008000">-- SET NOCOUNT ON added to prevent extra result sets from</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#008000"><span style="font-size:x-small;color:#008000">-- interfering with SELECT statements.</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">SET</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">NOCOUNT</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">ON</span></span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">;</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">DECLARE</span></span><span style="font-size:x-small"> @data_id </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">INT</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">DECLARE</span></span><span style="font-size:x-small"> @parameter_id </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">INT</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">DECLARE</span></span><span style="font-size:x-small"> @new_client_name </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">varchar</span></span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">(</span></span><span style="font-size:x-small">255</span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">)</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">DECLARE</span></span><span style="font-size:x-small"> @metric_id </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">INT</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">DECLARE</span></span><span style="font-size:x-small"> datacursor </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">CURSOR</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FOR</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">SELECT</span></span><span style="font-size:x-small"> DATA_ID</span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">,</span></span><span style="font-size:x-small">METRIC_ID </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FROM</span></span><span style="font-size:x-small"> DATA<font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">DECLARE</span></span><span style="font-size:x-small"> parametercursor </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">CURSOR</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FOR</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">SELECT</span></span><span style="font-size:x-small"> DATA_ID </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FROM</span></span><span style="font-size:x-small"> PARAMETERs<font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">OPEN</span></span><span style="font-size:x-small"> datacursor<font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">OPEN</span></span><span style="font-size:x-small"> parametercursor<font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">SET</span></span><span style="font-size:x-small"> @new_client_name </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">=</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#ff0000"><span style="font-size:x-small;color:#ff0000">'.a.'</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">DECLARE</span></span><span style="font-size:x-small"> @flag </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">INT</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">SET</span></span><span style="font-size:x-small"> @flag </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">=</span></span><span style="font-size:x-small"> 0<font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FETCH</span></span><span style="font-size:x-small"> NEXT </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FROM</span></span><span style="font-size:x-small"> datacursor </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">into</span></span><span style="font-size:x-small"> @data_id</span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">,</span></span><span style="font-size:x-small"> @metric_id<font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">while</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">(</span></span><span style="font-size:x-small;color:#ff00ff"><span style="font-size:x-small;color:#ff00ff">@@FETCH_STATUS</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">=</span></span><span style="font-size:x-small"> 0</span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">)</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">begin</span></span><span style="font-size:x-small"> <font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">if</span></span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">(</span></span><span style="font-size:x-small">@metric_id </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">&gt;=</span></span><span style="font-size:x-small"> 1 </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">and</span></span><span style="font-size:x-small"> @metric_id</span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">&lt;=</span></span><span style="font-size:x-small">72</span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">)</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#008000"><span style="font-size:x-small;color:#008000">/*iometer*/</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">begin</span></span><span style="font-size:x-small"> <font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FETCH</span></span><span style="font-size:x-small"> NEXT </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FROM</span></span><span style="font-size:x-small"> parametercursor </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">into</span></span><span style="font-size:x-small"> @parameter_id<font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">while</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">(</span></span><span style="font-size:x-small">@flag </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">=</span></span><span style="font-size:x-small"> 0</span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">)</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">begin</span></span><span style="font-size:x-small"> <font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">if</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">(</span></span><span style="font-size:x-small">@parameter_id </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">=</span></span><span style="font-size:x-small"> @data_id</span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">)</span></span><span style="font-size:x-small"> </span><span style="font-size:x-small;color:#008000"><span style="font-size:x-small;color:#008000">/*found the parameter */</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">begin</span></span><span style="font-size:x-small"> <font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#008000"><span style="font-size:x-small;color:#008000">/*UPDATE PARAMETERs SET Client_Platform = @new_client_name*/</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">SET</span></span><span style="font-size:x-small"> @flag </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">=</span></span><span style="font-size:x-small"> 1 </span><span style="font-size:x-small;color:#008000"><span style="font-size:x-small;color:#008000">/*update done*/</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">end</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">else</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">begin</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">SET</span></span><span style="font-size:x-small"> @flag </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">=</span></span><span style="font-size:x-small">0<font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FETCH</span></span><span style="font-size:x-small"> NEXT </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FROM</span></span><span style="font-size:x-small"> parametercursor </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">into</span></span><span style="font-size:x-small"> @parameter_id<font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">end</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">end</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">end</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">SET</span></span><span style="font-size:x-small"> @flag </span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">=</span></span><span style="font-size:x-small"> 0 </span><span style="font-size:x-small;color:#008000"><span style="font-size:x-small;color:#008000">/*reset update*/</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FETCH</span></span><span style="font-size:x-small"> NEXT </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">FROM</span></span><span style="font-size:x-small"> datacursor </span><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">into</span></span><span style="font-size:x-small"> @data_id</span><span style="font-size:x-small;color:#808080"><span style="font-size:x-small;color:#808080">,</span></span><span style="font-size:x-small"> @metric_id<font size=2> <p> </p> </font></span></p> <p><span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff">end</span></span></p> <span style="font-size:x-small"><font size=2> <p> </p> </font></span> <p><span style="font-size:x-small;color:#008000"><span style="font-size:x-small;color:#008000">-- Insert statements for procedure here</span></span></p> <span style="font-size:x-small;color:#0000ff"><span style="font-size:x-small;color:#0000ff"> <p>END<br/><br/><br/>Thanks,</p> </span></span>Tue, 17 Nov 2009 17:50:58 Z2009-11-27T09:58:32Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/40ef02dd-646d-4d8d-ab77-39d5d8b24452http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/40ef02dd-646d-4d8d-ab77-39d5d8b24452UQMIShttp://social.msdn.microsoft.com/Profile/en-US/?user=UQMISForecast Student Course Load (Number of Courses being Studied) over time - Decision Trees or Time SeriesHi,<br/><br/>I'm new at this (never heard that before on a forum !) I've looked at the samples, and done a lot of searching, but I can't quite see how to approach the problem I am trying to model.<br/><br/>I'm trying to build a model that would predict the amount of continuing student course load we will have in 2010, based upon the amount of load we had in previous years, and how the course load moves from year to year.  For example perhaps only 80% of students studying in their first year of a Bachelor of Arts program will continue on into their second year. Of those second year students, perhaps 90% of those will continue into their third year.  Then perhaps 95% of the students will graduate, and therefore not return the following year.  <br/><br/>Some of the students study full-time, some part-time and as we have to manage the overall course load, we need to model the amount of course load we have each year, so we know how many students we can have commencing in 2010 ( ie knowing how many places we can offer in the program).<br/><br/>We show course load as a percentage of the standard annual load.  So a course load of .5 means they are doing half the number of courses required to graduate on time (for tha BArts, this is three years)<br/><br/>There are factors that might also affect the student's continuation prospects, for example if their GPA (Grade Point Average) is really low (ie they're failing)<br/><br/>So a simple dataset to train with might look something like this:<br/><br/><br/> <table border=0 cellspacing=1 cellpadding=1> <tbody> <tr> <td><strong>StudentID</strong></td> <td><strong>Program</strong></td> <td><strong>CalendarYear</strong></td> <td><strong>Commencing</strong></td> <td><strong>YearLevel</strong></td> <td><strong>YearsToGraduate</strong></td> <td><strong>GPA</strong></td> <td><strong>CourseLoad</strong></td> <td><strong>CourseLoad_NextYear</strong></td> </tr> <tr> <td>1</td> <td>BArts</td> <td>2006</td> <td>Yes</td> <td>1</td> <td>3</td> <td>5</td> <td>0.6</td> <td>1</td> </tr> <tr> <td>1</td> <td>BArts</td> <td>2007</td> <td>No</td> <td>2</td> <td>3</td> <td>6</td> <td>1</td> <td>1</td> </tr> <tr> <td>1</td> <td>BArts</td> <td>2008</td> <td>No</td> <td>3</td> <td>3</td> <td>6</td> <td>1</td> <td>0</td> </tr> <tr> <td>2</td> <td>BArts</td> <td>2006</td> <td>Yes</td> <td>1</td> <td>3</td> <td>4</td> <td>0.25</td> <td>0</td> </tr> <tr> <td>3</td> <td>BArts</td> <td>2006</td> <td>Yes</td> <td>1</td> <td>3</td> <td>5</td> <td>0.5</td> <td>0.75</td> </tr> <tr> <td>3</td> <td>BArts</td> <td>2007</td> <td>No</td> <td>2</td> <td>3</td> <td>5</td> <td>0.75</td> <td>0</td> </tr> </tbody> </table> ...<br/><br/><br/>I think I could use a Microsoft Decision Trees algorithm if I simply wanted to look at whether the student continued into the next year, but because I need to model the amount of course load they continued with, I think that means I need to use the Time Series algorithm.<br/><br/>I can't quite see how the time series would work though, because unlike sales of products, the number of students in YearLevel 2 is a direct result of the number of students we enrolled in YearLevel 1.  <br/><br/>Any suggestions, and/or references to similar modelling scenarios would be most welcome.<br/><br/>Thanks<br/>HGWed, 18 Nov 2009 00:41:18 Z2009-11-18T00:41:21Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/8b659bcc-dae5-40fa-823f-744384e199f6http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/8b659bcc-dae5-40fa-823f-744384e199f6leokhttp://social.msdn.microsoft.com/Profile/en-US/?user=leokNeural Network against the cubeHi <br/>I have Neural Network model build from the cube to mine a dimension.<br/>It includes 4 attributes, 4 measures and 2 calcs, one of which is predict (predict only has been tried as well with the same results)<br/>Model is allowed to be built, deployed and processed with no errors.<br/>Once I run prediction, get the following error:<br/><br/>Parser: The end of the input was reached. (Microsoft SQL Server 2008 Analysis Services)<br/><br/>What does it mean?<br/><br/>Analogous model built against relational data worked OK<br/><br/>Thanks in advance<br/><br/>Tue, 17 Nov 2009 20:14:00 Z2009-11-27T09:59:29Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/eb72332f-bde0-4cf0-8453-333ed9b10618http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/eb72332f-bde0-4cf0-8453-333ed9b10618Juan M. Alvaradohttp://social.msdn.microsoft.com/Profile/en-US/?user=Juan%20M.%20AlvaradoData mining addin for excel 2010Anyone know when be available the data mining addin for excel 2010?  in this moment the current addin doesnt work in excel 2010 .. <br/><br/>Thanks for any info<hr class="sig">Juan Alvarado - MVP SQL SERVERTue, 17 Nov 2009 15:21:46 Z2009-11-17T15:21:47Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/c76401ca-191d-4fdc-8f59-0bd5d0f6ae0fhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/c76401ca-191d-4fdc-8f59-0bd5d0f6ae0fpablo_ardileshttp://social.msdn.microsoft.com/Profile/en-US/?user=pablo_ardilesDecision trees and the split of missing values<span style="font-size:12pt;color:#333333;font-family:'Segoe UI','sans-serif'">I've read that decision trees only splits on missing values if you have sparse data.<br/>But what happen's when you dont have enough missing sparse data.....the missing values goes to what leaf in the tree? </span><hr class="sig">pacTue, 17 Nov 2009 13:21:00 Z2009-11-17T13:21:01Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/37c8a5f2-0ea1-4d65-8631-ebf71d035e3ehttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/37c8a5f2-0ea1-4d65-8631-ebf71d035e3ess7http://social.msdn.microsoft.com/Profile/en-US/?user=ss7Normalising repeating dataHi,<br/> <br/> I would appreciate some help with normalising the following data most effectively.<br/> <br/> I have a large amount of data in un-normalised form, which I can add if it is essential. I am trying to limit this information I add on this forum to the section of relevance.<br/> <br/> Basically, i have table which will store information about toys. Within this there will be the following fields:<br/> Toy_name<br/> Cost<br/> Age_Range<br/> and then some more specific information. The specific information for each toy is:<br/> Toy_Type. <br/> The toy_type can be electronic, game, or wooden.<br/> For electronic I need to store Battery_Type and Number_of_Batteries.<br/> For type Game, I need to store, Type, this could card_game, board_game or other_game.<br/> For type wooden, I need to store Painted(Y/N).<br/> <br/> The specification requires a lot of other information aswell, for instance childrens names and details etc. But it is this toy type information that I am unsure about how to normalise. I am not sure how to create tables for this information. I feel that all the information about all the toy types should not be in one table but then I am unsure as to how it would be divided.<br/> I will be using Oracle to create my tables and then work with them further. At this stage I am primarily concerned with the actual design of the tables.<br/> <br/> Your help and suggestions will be greatly appreciated.<br/> <br/> Regards<br/>Mon, 16 Nov 2009 18:33:42 Z2009-11-18T08:44:51Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/247ac00f-0e92-4748-81b1-b24d65a370b1http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/247ac00f-0e92-4748-81b1-b24d65a370b1zx8754http://social.msdn.microsoft.com/Profile/en-US/?user=zx8754Deduplication - creating unique couples<p>Table<br/>C1 C2<br/>-------<br/>1 | 5<br/>2 | 5<br/>2 | 6<br/>1 | 6<br/>3 | 6<br/>3 | 7<br/>4 | 7<br/>4 | 8<br/>3 | 8<br/><br/>Desired Resulting Table<br/>C1 C2<br/>------<br/>1 | 5<br/>2 | 6<br/>3 | 7<br/>4 | 8<br/><br/>The logic is to find duplicate in C1 and keep the one which has lower value in C2. But both C1 and C2 must have unique values for each record.<br/>e.g. if we look at<br/>1 | 5<br/>1 | 6<br/>desired one is 1 5, but when we look at<br/>2 | 5<br/>2 | 6<br/>desired one is 2 6, because 5 is already matched with 1, so it should match with next lower which is 6.<br/><br/>Can someone submit code for this?<br/><br/>Many Thanks</p>Thu, 12 Nov 2009 18:36:35 Z2009-11-16T10:55:24Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/2ce89ede-363b-4521-8b5e-479f11b3a6d7http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/2ce89ede-363b-4521-8b5e-479f11b3a6d7snassimrhttp://social.msdn.microsoft.com/Profile/en-US/?user=snassimrQuery model and put values in Excel sheetHi , <br/><br/>If it is possible to run DMX from VBA and save query results in Excel Sheet ? If yeas if there is some reference using Data Mining from VBA ?<br/><br/>Thanks,<br/>Nissim<br/><br/>Mon, 16 Nov 2009 00:01:29 Z2009-11-16T00:01:29Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/c087024b-8657-477b-94f6-f349a3cc21eahttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/c087024b-8657-477b-94f6-f349a3cc21eaPeglegspeedkinghttp://social.msdn.microsoft.com/Profile/en-US/?user=PeglegspeedkingSQL Server mining - in general<p>As having met several data mining persons they mostly do the following:<br/>- receive an abstract of the database<br/>- eventually write back to the database to their models<br/><br/>Just from a high level standing point I'm interested why MSFT has chosen to:<br/>- put the mining in analysis<br/>- and how, from a competitive standing point, how the SSAS mining relates to the option described aboven (pros/cons)</p><hr class="sig">Business Intelligence professionalSat, 14 Nov 2009 19:02:12 Z2009-11-14T19:02:12Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/8e740d35-d17c-4070-8b6e-0a896ac8f9c2http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/8e740d35-d17c-4070-8b6e-0a896ac8f9c2Kirthika Jananihttp://social.msdn.microsoft.com/Profile/en-US/?user=Kirthika%20JananiRoles in SSAS Data mining <p>Hello, <br /><br />I have built an association mining model using SSAs and am having trouble with assigning a role. Whe I have a role that has full control , I am able to use the members<br />of that role to execute DMX queries whereas if I run a DMX query using a user who only has membership with Read defintion my query says the user does not have permission.<br /><br />Why does this happen ?&nbsp;</p>Fri, 16 Oct 2009 04:27:57 Z2009-11-13T17:56:45Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/58ec85c2-ba77-4ec1-9ace-2e88886608dehttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/58ec85c2-ba77-4ec1-9ace-2e88886608dePatrick Granthttp://social.msdn.microsoft.com/Profile/en-US/?user=Patrick%20GrantREGRESSOR Modelling Flag Error<p>I keep getting the error below on my first data mining model.  <br/>========================================<br/>Error 1 Error (Data mining): In  the mining model, TK445 Ch09 Prediction Decision Trees, the Age column must be of a continuous type since the REGRESSOR modeling flag is set on it. <br/>======================================<br/>I hadset an Age column's properties to Discretised into 5 equal areas.   I was also supposed to clear the Modelling flags property which already appeared to be clear under the Age column properties.  (I think I may have done something wrong here.)<br/><br/>Any ideas how to get round the problem?  I can't find anywhere in the Data Mining Model where Model Flags are set to REGRESSOR...<br/></p><hr class="sig">PGWed, 11 Nov 2009 13:33:41 Z2009-11-12T12:30:44Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/70719491-69a0-4340-bbac-c977d1aeb154http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/70719491-69a0-4340-bbac-c977d1aeb154Thomasmoxhttp://social.msdn.microsoft.com/Profile/en-US/?user=ThomasmoxCross validation report - settings<p>Hi,<br/><br/>I am curenttly running a logistic regression model, and a resulting cross validation report, in SQL 2008.</p> <p>A large amount of analysis is done on the data before it is run into the logistic regression algorithm to ensure its predictiveness, and that multicollinearity does not reside with the dataset.<br/><br/>Due to the fact that the dataset is streamlined before it is run into the logistic regression algorithm the maximum number of input and output variables are set at 0. The system never exceeds 20 variables at the best of times (Dataset is discrete in nature). The remainder of the settings for the logistic regression algorithm are set at default.<br/><br/>The cross validation report settings are as follows:</p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="color:#011520"><span style="font-size:small"><span style="font-family:Calibri">“<em>Data set</em>” - Use test cases</span></span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="color:#011520"><span style="font-size:small;font-family:Calibri"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="color:#011520"><span style="font-size:small"><span style="font-family:Calibri">“<em>target attribute</em>” – Applicable variable that is being predicted within the logistic regression algorithm</span></span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="color:#011520"><span style="font-size:small;font-family:Calibri"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="color:#011520"><span style="font-size:small"><span style="font-family:Calibri">“<em>target state</em>” – 0</span></span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="color:#011520"><span style="font-size:small;font-family:Calibri"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="color:#011520"><span style="font-size:small"><span style="font-family:Calibri">“target threshold” – 95<br/></span></span></span><span style="color:#011520"><span style="font-size:small;font-family:Calibri"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="color:#011520"><span style="font-size:small"><span style="font-family:Calibri">“test list” - null</span></span></span></p> <p>The issue that I face is that the log score that is usually returned by the cross validation report is negative e.g. gets very close to 0 but never crosses into positive territory. (I interpret this a being not a great result). I have tried to run different datasets through the algorithm, and have adjusted the &quot;target threshold&quot; downwards, but I unfortunately seem to encounter the same issue on each run.<br/><br/>Do you have any suggestions as to how this can be improved? Do I for instance, have to change the &quot;data set&quot; to another setting?</p>Thu, 12 Nov 2009 08:50:12 Z2009-11-16T05:21:51Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/45121fb9-4692-4f9c-b953-f08935ea8422http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/45121fb9-4692-4f9c-b953-f08935ea8422alvaro.pthttp://social.msdn.microsoft.com/Profile/en-US/?user=alvaro.ptSequence Clustering - Cluster Similarity Strength Hello,<br/> <br/> I am using MS Sequence Clustering algorithm in a process mining context, i.e. to automatically extract business processes models from observed information system's behaviour.<br/> <br/> To do so, an application is being developed that call's MS Sequence Clustering programmatically and presents the markov models and the cluster diagram similary to MSAS Viewer.<br/> <br/> It was noticed that the cluster diagram is very important to incrementally understand business process variants and infrequent behaviour, thus i would like to replicate that my application. <br/> <br/> However, i am having difficulties to calculate clusters similarity strength. I was using the Frobenious Norm of the transition matrices difference for two given clusters, but i noticed that the states probabilities also should be taken into account.<br/> <br/> Can you enlight me about the metric that MS Sequence Clustering uses to calculate cluster similarity strength?<br/> <br/> Regards<br/>     Thu, 12 Nov 2009 01:11:45 Z2009-11-12T01:11:46Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/dc932e2c-876c-4335-9819-a2159896e6b9http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/dc932e2c-876c-4335-9819-a2159896e6b9Chad - SWChttp://social.msdn.microsoft.com/Profile/en-US/?user=Chad%20-%20SWCPeriodicity_hint for Time Series forecastIf my time key is day based do I need to use 365 for a yearly periodicty hint and 90 for quarterly? <hr class="sig">Chad-SWCMon, 09 Nov 2009 21:05:23 Z2009-11-11T18:07:24Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/4e77fee3-5b9a-4023-821f-3e4d7124d6fchttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/4e77fee3-5b9a-4023-821f-3e4d7124d6fcvytheesehttp://social.msdn.microsoft.com/Profile/en-US/?user=vytheeseText Mining from the documentCiao,<br/><br/>I am novice in this field, <br/><br/>I am trying to develop a application that need to extract the collection of domain specific words from a provided text.   I prefer to have it as a API rather than SQL server text mining  Since I need to port the application to the client side, so szie also matters here.  <br/><br/>Is there any C# libraries available with good example to do the above.   <br/><br/>Any directions/suggestions are welcome.<br/><br/>Grazie,<br/>Vythees<br/><br/><br/><br/><hr class="sig">vytheeseTue, 10 Nov 2009 16:05:38 Z2009-11-23T02:03:11Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/fb74ab56-1172-4460-8953-f566ca0a0cf3http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/fb74ab56-1172-4460-8953-f566ca0a0cf3snorlakshttp://social.msdn.microsoft.com/Profile/en-US/?user=snorlaksdeveloping application that uses analysis servicesHello,<br/> <br/> Im completly  new to datamining using sql server analysis services. Today I went thrugh basic tutorials and I just love making datamining using SQL Sqerver Business Inteligence Development Studio. <br/> <br/> I have a task to write WinForms application that makes some operations on data stored in sql server 2008 (Ado.NET, LINQ and so on).<br/> <br/> I think that it would be great to use power of analysis services but not from Sqerver Business Inteligence Development Studio but directly from my WinForms application. Custom way in which user types parameters and thes see results.<br/> <br/> I would like to know if it's possible to achieve such thing. <br/> <br/> Thank You very much for the answer,<br/> <br/> All best !!<br/>Mon, 09 Nov 2009 23:43:58 Z2009-11-11T18:55:59Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/233ed3c1-8c40-4d66-8e3b-9e5d288daabahttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/233ed3c1-8c40-4d66-8e3b-9e5d288daabaSteve Powellhttp://social.msdn.microsoft.com/Profile/en-US/?user=Steve%20PowellGeneral advice about getting started for a mining newbie<p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">I’m declaring myself as a mining newbie before I even start.</span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">I have been building cubes and reporting systems for about 4 years using various versions of SQL Server, in between more conventional development (which is my background). We’re now using 2008 for pretty much everything.</span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">All our cubes are currently retrospective. They tell us what has happened, there are no predictive functions/reports. The company I work for is a private hospital network. We have a real time message feed form our Hospital Administration Systems that gives us lots of patient and appointment data.</span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">We can look at this to see how busy our outpatient departments are and we can use this data and our inpatient data to provide us with some rough metrics on the “conversion ratio”. An example is; for every ten patients who come in complaining about a bad knee we actually do 2 knee replacement operations and 2 joint injections and 4 physio sessions and 2 don’t require any treatment. Each of these outcomes requires different staffing levels/skills.</span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">This tells us lots of interesting things about what happened. What we want to do now is a prediction of what is going to happen.</span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">So knowing we have 40 patients booked in for outpatient appointments next week we want to get an idea of how many of the various outcomes we might predict.</span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">We’re also interested in looking further ahead. Obviously we book outpatients in right up until a couple of days before the clinic. We do sometimes cancel the clinic if not enough patients are booked in and we reschedule the patients into the following week to bunch them up. <span style=""> </span>It would be useful to be able to predict that this would be required more than a couple of days in advance. Since clinics start getting booked weeks in advance it would be useful to flag the clinics that are filling up too slowly earlier.</span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">All this sounds like I need a mining model that I need to train and then impose on the live data. While I’ve read lots of interesting articles. I haven’t found anything step by step simple enough for someone like me who lets face it feels more comfortable with relational tables and OO than he does with this set based math on massive lumps of data.</span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">I’ve got a number of books on SSAS ranging from the Step by Step stuff through to some of the more advanced tomes. This reflects my ongoing experience and depth of requirement as I’ve gone along. I can’t however find anything that’s simple enough for me to get a toe hold on this data mining. Pretty much everything I’ve found thus far assumes I know what the long words mean. And at this point I don’t.</span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">So has anyone got any advice, areas to focus on, books to recommend. </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">Many thanks</span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText">Steve</span></span></p> <p class=MsoNormal style="margin:0cm 0cm 0pt"><span style="font-size:12pt"><span style="font-family:BUPAQuantaText"> </span></span></p>Tue, 10 Nov 2009 11:43:36 Z2009-11-10T11:43:36Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/de9ea9f7-8171-4c96-aced-e5a98c30e4ffhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/de9ea9f7-8171-4c96-aced-e5a98c30e4ffbardcanhttp://social.msdn.microsoft.com/Profile/en-US/?user=bardcanStronger correlation formula.<span style="font-family:Arial, sans-serif;font-size:13px;line-height:15px;white-space:pre-wrap">If I have two series of numbers, series A contains either 1s or 0s, depending on if a patient took a pill or not. Series B contains random numbers. All of the series B numbers that coincide with the patient taking a pill have an average of 100, whereas those that coincide with NOT taking a pill average to 101. There is a HUGE amount of data, so I am trying to find the formula that will show that there is a strong correlation between the two - that if the patient takes the pill, the most likely result is that their B measurement will go up by 1 point. A standard correlative coefficient shows a low correlation... around .15. Any help would be greatly appreciated.</span>Sat, 07 Nov 2009 08:54:59 Z2009-11-16T04:23:24Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/ab8f8ef9-3496-4dcc-a816-cc0e5674ed3ahttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/ab8f8ef9-3496-4dcc-a816-cc0e5674ed3asnassimrhttp://social.msdn.microsoft.com/Profile/en-US/?user=snassimrStrange cluster profiles numbers in Excel Add-inHi , everybody !!!<br/><br/>I have performed clustering and get two clusters :<br/><br/>First contains :<br/><br/>1350,180<br/>1300,120<br/><br/>80,1000<br/>50,1100<br/>0,1000<br/><br/>And mean and Deviation values seems aren't right.<br/><br/>For example , in second cluster , in second column &quot;Cluster profiles&quot; show Mean 800.89 adn SDV = 262.82<br/><br/>Any explanation for this ?<br/><br/>Thanks,<br/>NissimSat, 07 Nov 2009 15:47:51 Z2009-11-26T09:21:29Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/28341944-443c-40c5-9038-94121391fd00http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/28341944-443c-40c5-9038-94121391fd00dessertshttp://social.msdn.microsoft.com/Profile/en-US/?user=dessertsHow to choose the right model for deployment ?Model A = considerably more accurate than model B on training data <br/> Model B = more accurate than model A on validation data<br/> <br/> So which one would you consider for final deployment ? <br/> why?<br/> <br/> i was thinking if Model B is better.<br/>Sun, 01 Nov 2009 11:03:54 Z2009-11-11T09:52:42Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/eaced27b-321a-4241-98e0-06eee862c14fhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/eaced27b-321a-4241-98e0-06eee862c14fsnassimrhttp://social.msdn.microsoft.com/Profile/en-US/?user=snassimrCategories report for arbitrary clustering model Hi , everybody !!!<br/><br/>Can I to produce a Sheet called &quot;Categories Report&quot; after Detect categories but for some model I am already have on SSAS server ?<br/><br/>ThanksFri, 06 Nov 2009 02:42:18 Z2009-11-06T13:54:30Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/f899456c-f63b-48f3-9ce2-e3b47c2228c4http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/f899456c-f63b-48f3-9ce2-e3b47c2228c4NHBuckeyehttp://social.msdn.microsoft.com/Profile/en-US/?user=NHBuckeyeSorting question I have to &quot;group by&quot; purchase date, but the database's purchase date includes the time as well.  So when I group by purchase date, it includes the time as part of the date. Since the times are unique, I don't get &quot;grouped by&quot; info.  Is there a way to write a sql command to group by date while ignoring the time stamp?  The date part of the information consistently takes up the first 10 digits.<br/> <br/> Thanks!!!Tue, 03 Nov 2009 21:07:04 Z2009-11-16T03:16:34Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/2380e187-d575-43e2-a12c-cb3abb34bdc1http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/2380e187-d575-43e2-a12c-cb3abb34bdc1Jorge DMhttp://social.msdn.microsoft.com/Profile/en-US/?user=Jorge%20DMHow do we install the MovieClick database into Analysis Services?Hi:<br/> <br/> I am reading Chapter 4 in &quot;Data Mining with MS SQL Server 2008&quot; and can't seem to find the instructions to install the MovieClick database into an Analysis Service as described on Page 137: &quot;Download the MovieClick database and install it as described in Appendix A&quot;. I have restored the .BAK file into the regular SQL server, but that is not enough to follow the examples in the book. I have read Appendix A, and there is no instructions there on how to accomplish this task.   Your help will be greatly appreciated.<br/> <br/> JorgeSun, 01 Nov 2009 07:37:51 Z2009-11-16T02:17:42Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/1566f4b3-51be-4b51-874a-8c3e90a52d45http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/1566f4b3-51be-4b51-874a-8c3e90a52d45AndyComohttp://social.msdn.microsoft.com/Profile/en-US/?user=AndyComoOne to many query: return most common value<p>I am working on a one to many query for an assets database.  Here is some sample data<br/><br/><br/><br/><br/>Server Name        Application        Primary_User<br/>Server1                 SQL                   DBA<br/>Server1                 Citrix                 IT<br/>Server1                 Oracle                DBA<br/>Server2                 Sharepoint         Web<br/>Server3                 SQL                   DBA<br/>Server3                 Sharepoint         Web<br/>Server3                 Norton               Security<br/>Server3                 IDS                   Security<br/><br/>The desired output is one row per server with the server name, count of applications, and the Primary User that appears the <strong>most</strong> (not just the first, last, min, or max).<br/><br/>It would look like this<br/><br/>Server Name    Applications  Primary_User<br/>Server1            3                  DBA<br/>Server2            1                  Web<br/>Server3            4                  Security<br/><br/><br/>Is there a sub query that can accomplish this?  Thanks in advance!</p>Thu, 22 Oct 2009 15:35:08 Z2009-11-04T12:49:46Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/8e51bce8-466a-4c53-bc0a-7066ccf4bd06http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/8e51bce8-466a-4c53-bc0a-7066ccf4bd06naeimehhttp://social.msdn.microsoft.com/Profile/en-US/?user=naeimehCreate Table from MDX queryHi,<br/><br/>I want create a table from the result of the MDX query. When I finish the wizard of import data in SQL Server Environment. at the last stage, the error is :<br/><strong><span style="font-size:xx-small"> <p>Pre-execute (Error)</p> <strong> <p>Messages</p> <p>Information 0x4004300b: Data Flow Task 1: &quot;component &quot;Destination - Query&quot; (31)&quot; wrote 0 rows.<br/>(SQL Server Import and Export Wizard)</p> </strong></span></strong><br/>please guid  me.<br/>Thank you a lot.<br/>NaeimehWed, 28 Oct 2009 13:55:47 Z2009-11-06T06:30:39Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/59acaa7c-f282-47e5-923c-1b21699ac585http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/59acaa7c-f282-47e5-923c-1b21699ac585Amit Dixithttp://social.msdn.microsoft.com/Profile/en-US/?user=Amit%20DixitNaive Bayes AlgorithmI am facing issue with text classification by using SQL server Naive Bayes algorithm. When i try to find terms which are there in two different classes, it is giving wrong results. It is always giving same probability for each class. <br/> <br/> I am using SQL Server 2008 Analysis services. I have 3 classes - Dog, Cow, Cat. Each has got 2 terms. When i try to find 2 terms in model from two different classes, it gives equal probability of being in all three classes.<br/> <br/> <br/> Any pointers will be helpful.<hr class="sig">AmitTue, 03 Nov 2009 10:59:59 Z2009-11-11T09:37:11Zhttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/604dd7a4-3bcf-483f-806a-b5c4b556b30chttp://social.msdn.microsoft.com/Forums/en-US/sqldatamining/thread/604dd7a4-3bcf-483f-806a-b5c4b556b30ctatigerhttp://social.msdn.microsoft.com/Profile/en-US/?user=tatigerBasic Data Mining Tutorial - Change Line Color on Lesson 5 Lift Chart Can anyone tell me how to change the chart line colors (or any line format characteristic) on the Lift Chart that is developed in Lesson 5 of the Basic Data Mining Tutorial for SQL 2008.  One of the lines shown on the chart is in yellow and I'd like to make it more distictive by changing the color.<br/><br/>Many ThanksSun, 01 Nov 2009 11:36:20 Z2009-11-01T11:36:21Z