Friday, January 19, 2007 8:04 PM
Dear Sirs and Madams,
it's quite hard to find informations about OLAP mining I think. So I have some questions to you:
What advantages do I have by using Olap Mining instead of "normal" mining (relational databases)?
Is it just faster or are there other advantages?
What are the disadvantages? The data can be overaggregated (no detailled results), can not? Are there other disadvantages?
What problem do I have to face?
Whats going on with empty cells?
Thank you very much in advance and have a nice weekend.
Monday, January 22, 2007 6:44 PM
The advantages/disadvantages of OLAP mining really lay around the advantages/disadvantages of OLAP itself.
Personally, I recommend using OLAP mining models when you require the kind of input that OLAP can generate easily. For example, say you had a sales force of 10,000 people and you were using each sales person as a case. If the attributes you were mining were an individual's year over year profit growth, # of customer contacts, # of leads turned into customers per month, or growth of that per month, or any other measure that is the bread and butter of OLAP to compute, than by all means OLAP models are your best bet.
However, if all of your data is in relational in roughly the form it needs to be mined, then you are doing yourself no favors by trying to put the data in a cube first. You will get better performance overall from a relational mining model.
The main problems you have to face using OLAP as a source is that OLAP engines, in general, are designed to return small result sets from highly aggregated data, whereas data mining, in general, is designed to perform operations on large sets of raw (or preprocessed) data.
The implementation of OLAP in Analysis Services, requires that all of the result set be materialized in memory before returning to the client. This generally isn't a big deal for typical OLAP queries, but if you are, for instance, trying to mine all of your transaction data for the past 10 years, you will run into difficulties.
Another thing to consider is how you are going to apply your model. The AS tools don't provide a good solution for applying models to OLAP cubes in many scenarios. For example, there is currently no solution for creating accuracy charts with an OLAP source. Out prediction query builder also doesn't provide for an OLAP source (although you can craft your own DMX queries by hand).
Even with these issues, the OLAP model may be the best way to go. I know of an implementation similar to the one I described above, creating a model per store/product combination, of which there were thousands. A sproc to compute relationally the input dataset for one combination took 1.5 hours to run, whereas processing a cube to compute the data took a few minutes with near instantaneous query times.