locked
Stupid developer tricks - When a data type overflow doesn't cause an overflow RRS feed

  • Question

  • Build a very basic cube that has a dimension with a hierarchy and a single measure.  For example, a time dimension with Year, Quarter, Month, Date along with a quantity sold.  The quantity sold is an integer in your database.

    Now, load up your database with test data such that the quantity when rolled up to a quarter level will be just less than the maximum value for an integer.  Process your cube and see the absolutely stupid value that gets displayed for a year.

    For those of you reading along and not wanting to build a cube to see what happens, here it is in a nutshell.

    To make the math simple, we'll use 2 billion as the max value for an int and -2 billion and the min value.  Say for example that the quantity sold for each quarter was 1 billion units.  When you look at the cube from the quarter on down, you see nice, neat POSITIVE values.  But, when you look at the quantity sold for the year, SSAS reports the quantity sold as 0.

    No, it didn't substitute 0 for a data type overflow (4 billion exceeds the 2 billion max).  Instead, the developer of this "feature" decided that it would be a really nice idea to just ignore the fact that you had a data type overflow and just treat the range of values as a looping structure.  So, his code said, take the aggregated value, run it all the way up to the last value in the range and take anything that exceeded that value and wrap it back through the bottom end of the range.  So, in this developers world 2 billion really does equal 2 billion, 3 billion = -1 billion, 4 billion = 0, 5 billion = 1 billion, etc.  The min and max become interesting "suggestions" that this developers just chose to ignore and treat like some endlessly looping structure.

    Yes, I know that you can go into the cube and change the data type to something larger.  But, having numbers that just endlessly wrap around within their allowed range of values is complete idiocy.  I don't care if it has been documented or not.  I don't care if the code is working "as designed".  When 1 + 1 no longer equals 2, you have a problem.  This is a simple inability to perform basic math and is a BUG.


    Mike Hotek BlowFrog Software, Inc. http://www.BlowFrogSoftware.com Affordable database tools for SQL Server professionals
    Wednesday, September 8, 2010 7:37 PM

Answers

  • Yes, this is annoying and it has been reported to Microsoft on connect (see the link in my post here http://geekswithblogs.net/darrengosbell/archive/2006/11/27/99190.aspx) but it has been closed as won't fix. This is pure speculation on my part, but I am guessing that the overhead of checking for overflows may have had too great an impact on performance.

    It's not ideal, but you can use the Measure Group health check feature in BIDSHelper to try to see if this issue might affect any of your measures http://bidshelper.codeplex.com/wikipage?title=Measure%20Group%20Health%20Check&referringTitle=Home 


    http://geekswithblogs.net/darrengosbell - please mark correct answers
    Thursday, September 9, 2010 2:42 AM
  • I can honestly say that I feel your pain. I have come across things like this in the past where you do just gape and ask "really?" Anyone who knows me will know my term of "MMP" for Microsoft Magic Poopie when stuff just works the way it does because it does, or when the MS application knows something but you don't. (Just don't get me started on DBCC CHECKIDENT RESEED...)

    However, reading through the posts here I'm torn, and I honestly don't know why, but I feel compelled to respond. On the one hand there is no excuse for presenting an incorrect answer due to arithmetic overflow. Having been a consultant before I have been in the position were you have to ask the client, in the nicest possible way of course, whether they would like the answer quickly and possibly wrong or slower and correct. So although I strongly suspect that Darren is correct that performing arithmetic overflow detection on every calculation across a multi-dimensional database is probably a performance degrading operation, there is no excuse. Or is there...

    Enter the other side of my brain, which is based on there being a performance issue if detection is enabled...
    So, given a problem on the scale of performing multi-dimensional aggregations across many levels as well as internal calculations. Then discovering that the removal of a given operation increases performance dramatically albeit with dire consequences if some state is not set, what do you do? You can leave the operation in, knowing that the data will be accurate in all cases (and I use the term "accurate" loosely there, as arithmetic overflow does not reflect accurate data but rather a state) but performance is less than it could be or you can remove it and document the required steps to prevent the side-effects from occurring. From a purely selfish point of view I would rather have the performance and know to use a data type with a precision and scale large enough to hold the largest aggregate value. One could argue the standard idiom of "garbage in - garbage out" where the data type precision and scale is an input. If it is garbage, well... Though to be honest I'm not sure I would side with that.

    The long and short of it is that it's a shame it has worked out like that really. Having used Oracle/Hyperion Essbase, Oracle OLAP Services and IBM/Cognos/Applix TM1, in my opinion Analysis Services is one of the more powerful, easy to learn and configure products matched possibly only by TM1. Essbase and Oracle OLAP are arguably more powerful in other arenas than AS but to set them up and configure them... Oh my word...

    I guess every product has its issues and nuances, though at a risk of incurring your wrath, was the data ever reconciled with the underlying data source before being placed into production?

    Anyhoo, in a world that is crying for performance, performance, performance, is it acceptable to provide it with a product that is quick albeit with some side-effect if a given documented state is not matched? I'll leave that as an open question...

    Thursday, September 9, 2010 8:19 PM
  • Well,

    In many cases they have reasons for doing what they are doing. I have had bugs fixed and suggestions closed as "will implement in next release". When I get "by design" or "won't fix" it is usually because the issue can have consequences which I have not envisioned - like compatibility, other related behaviour, etc. In Microsoft's defense - it is hard for us to see a bigger picture - one where the software needs to be bakc-forward compatible, to work with other tools, on different systems, etc.

    While SSAS got "banned" in your environment, it may have won over the users of 1000 others because it is faster because of this "feature". I think that reporting the issue is important because the product team may have missed something, but when they reject your bug it may be a good idea to find the reason outside of your particular environment, and sometimes even outside the tech world. SInce SSAS is the leading OLAP server globally and is apparently the best selling OLAP server for Oracle among others wth MDX as the de-facto standard for multidimensional querying, I think that Microsoft are doing quite well currently with shaping up the product and with balancing its features.

    Instead of "banning" SSAS if your business and developers had the desire and skills to test it properly they would have noticed the problem and would have come up with a solution - e.g. as Darren suggested - using a data type which can accommodate the larger numbers. This would have had a performance impact but would have worked out in your case.

    It is always between two evils - performance impact in ALL scenarios while checking for overflows, or a problem in a small number of implementations, which can get taken care of with a bigint or some other large data type in the real numbers domain. I would personally prefer what the SSAS product team has done if the choice is as simple as that (based on Darren's assumption).


    Boyan Penev --- http://www.bp-msbi.com
    Thursday, September 9, 2010 11:52 PM
  • Okay, fair enough, I realise that it is impossible to test every permutation in a cube etc. and that reconciling at a given level and trusting that AS will behave the same when aggregating further is a fair assumption. Essentially you are testing the inputs rather than testing that AS behaves properly, where you trust that MS have tested the latter for themselves. Clearly that question needed to be asked. I'm sure you understand.

    Beyond that, I think that it is fairly easy to argue that the data type is an input. That fact that it can be changed in BIDS makes it so. Note that BIDS merely makes a best effort to determine the data type, nowhere is there an assertion that it is the correct one. Just as BIDS identifies the primary key based on a clustered key on the relational table, irrespective of whether or not it is actually the primary key. Once again a best effort, though clearly flawed. As with any software you need to know what you, and the software, is doing.

    I think that the MS marketing strategy with regards PowerPivot is probably beyond the scope of this discussion. (i.e. "Is that strategy correct" versus "based on that strategy, the product is flawed".)

    With regards your last paragraph, that raises quite an amusing thought... So, do you use SQL Server? If you do, based on that paragraph you probably shouldn't. Have you ever run the something similar to the following:

    SELECT 5 / 2  AS "incorrect",
        5 / 2.0 AS "correct"
    

    Note how the incorrect column silently produced the incorrect value. Once again, you need to know the software and know that you must provide a scale to an input if you expect to receive scale in the output. This is not true of Oracle which will automatically do this.

    As for the view conversion idea, how is that different to this scenario:

    CREATE TABLE [test_data_type]
    (
     [col1] [int]
    );
    
    INSERT INTO [test_data_type]
    VALUES (2147483647), (2147483647);
    
    SELECT SUM([col1]) AS "col1"
    FROM [test_data_type]
    

     

    Note how that will raise the error "Arithmetic overflow error converting expression to data type int. " and although not silent, still requires a cast to BIGINT in order to get the correct aggregated answer. Essentially any SQL Query or view , providing an aggregated answer from that table will require a cast in order to ensure that the aggregated value can be calculated and presented.

    Ultimately, you always need to know the software you are using. Period. If you don't you get what you are given and like it. There are parts of Excel etc. that baffle me and I fully expect to get incorrect answers if I should venture in to that space. Please note, I am not saying that it is the correct thing to do to ignore arithmetic overflow (or the wrapping effect) but more that in given scenarios it may well be understandable if not forgivable.

     

     

    Monday, September 13, 2010 6:22 PM

All replies

  • Have you reported this at Microsoft Connect?

    Boyan Penev --- http://www.bp-msbi.com
    Thursday, September 9, 2010 12:10 AM
  • Yes, this is annoying and it has been reported to Microsoft on connect (see the link in my post here http://geekswithblogs.net/darrengosbell/archive/2006/11/27/99190.aspx) but it has been closed as won't fix. This is pure speculation on my part, but I am guessing that the overhead of checking for overflows may have had too great an impact on performance.

    It's not ideal, but you can use the Measure Group health check feature in BIDSHelper to try to see if this issue might affect any of your measures http://bidshelper.codeplex.com/wikipage?title=Measure%20Group%20Health%20Check&referringTitle=Home 


    http://geekswithblogs.net/darrengosbell - please mark correct answers
    Thursday, September 9, 2010 2:42 AM
  • It's out on Microsoft Connect.  I've essentially given up on posting stuff there.  Not a single thing that I've posted has ever been fixed.  In many cases, they will sit out there for months before anyone even responds to them.  The vast majority of the ones that I've posted have sat around with no response and then suddenly just get closed as either - won't fix, can't reproduce, or by design.

    I had several issues from SQL Server 2005 that sat for months and were suddenly closed as can't reproduce once the betas for SQL Server 2008 started to come out.  They should really put in options labeled - "I don't feel like dealing with this issue, so I'm just going to close it." and "We've left this sitting out here for so long that we've released 1 or 2 more versions of the product and we're just mass closing issues reported on versions that we don't want to do development on anymore." Because, every one of the issues that I've had closed as "can't reproduce" were things that I had already reproduced on multiple machines, multiple editions, and in multiple environments.  My favorite one was an issue opened that you can't resize the window in Profiler when you are configuring a trace that was closed as "can't reproduce" after sitting out on Connect for so long that I had forgotten that I even filed it.  Reproducing a modal, non-resizable window in Profiler takes all of about a minute. 

    Working as designed is my all-time favorite.  Yes, the feature may be working exactly as it was coded.  That doesn't change the fact that it is either doing something that it shouldn't do or it is doing something that is flat out wrong.  This happens to be one of those things that is flat out wrong.  When Microsoft closes something like this as "won't fix", to me, they are really saying "We don't want to be bothered with making sure that our calculations actually produce correct results, we want to spend our time producing shiny things".  This issue is really simple SSAS can't add numbers, period.  By not fixing this issue, Microsoft is saying that you can't trust a single computation that SSAS produces, because I never know when something is going to flip over the top of a data type and start giving me invalid results.

    We had one that I caught which actually looped around the data type multiple times.  The number reported looked perfectly reasonable.  The business was relying on computations based on that number and driving a large section of the business based on it.  When we found the bug and fixed it, the business realized that it had been pouring money down a drain that could have been used for better things.  It was serious enough that SSAS got banned from the environment and they won't even discuss ever using the BI stack in SQL Server.


    Mike Hotek BlowFrog Software, Inc. http://www.BlowFrogSoftware.com Affordable database tools for SQL Server professionals
    Thursday, September 9, 2010 3:10 PM
  • Did that.  BIDS Helper didn't find this one.
    Mike Hotek BlowFrog Software, Inc. http://www.BlowFrogSoftware.com Affordable database tools for SQL Server professionals
    Thursday, September 9, 2010 3:13 PM
  • I can honestly say that I feel your pain. I have come across things like this in the past where you do just gape and ask "really?" Anyone who knows me will know my term of "MMP" for Microsoft Magic Poopie when stuff just works the way it does because it does, or when the MS application knows something but you don't. (Just don't get me started on DBCC CHECKIDENT RESEED...)

    However, reading through the posts here I'm torn, and I honestly don't know why, but I feel compelled to respond. On the one hand there is no excuse for presenting an incorrect answer due to arithmetic overflow. Having been a consultant before I have been in the position were you have to ask the client, in the nicest possible way of course, whether they would like the answer quickly and possibly wrong or slower and correct. So although I strongly suspect that Darren is correct that performing arithmetic overflow detection on every calculation across a multi-dimensional database is probably a performance degrading operation, there is no excuse. Or is there...

    Enter the other side of my brain, which is based on there being a performance issue if detection is enabled...
    So, given a problem on the scale of performing multi-dimensional aggregations across many levels as well as internal calculations. Then discovering that the removal of a given operation increases performance dramatically albeit with dire consequences if some state is not set, what do you do? You can leave the operation in, knowing that the data will be accurate in all cases (and I use the term "accurate" loosely there, as arithmetic overflow does not reflect accurate data but rather a state) but performance is less than it could be or you can remove it and document the required steps to prevent the side-effects from occurring. From a purely selfish point of view I would rather have the performance and know to use a data type with a precision and scale large enough to hold the largest aggregate value. One could argue the standard idiom of "garbage in - garbage out" where the data type precision and scale is an input. If it is garbage, well... Though to be honest I'm not sure I would side with that.

    The long and short of it is that it's a shame it has worked out like that really. Having used Oracle/Hyperion Essbase, Oracle OLAP Services and IBM/Cognos/Applix TM1, in my opinion Analysis Services is one of the more powerful, easy to learn and configure products matched possibly only by TM1. Essbase and Oracle OLAP are arguably more powerful in other arenas than AS but to set them up and configure them... Oh my word...

    I guess every product has its issues and nuances, though at a risk of incurring your wrath, was the data ever reconciled with the underlying data source before being placed into production?

    Anyhoo, in a world that is crying for performance, performance, performance, is it acceptable to provide it with a product that is quick albeit with some side-effect if a given documented state is not matched? I'll leave that as an open question...

    Thursday, September 9, 2010 8:19 PM
  • Well,

    In many cases they have reasons for doing what they are doing. I have had bugs fixed and suggestions closed as "will implement in next release". When I get "by design" or "won't fix" it is usually because the issue can have consequences which I have not envisioned - like compatibility, other related behaviour, etc. In Microsoft's defense - it is hard for us to see a bigger picture - one where the software needs to be bakc-forward compatible, to work with other tools, on different systems, etc.

    While SSAS got "banned" in your environment, it may have won over the users of 1000 others because it is faster because of this "feature". I think that reporting the issue is important because the product team may have missed something, but when they reject your bug it may be a good idea to find the reason outside of your particular environment, and sometimes even outside the tech world. SInce SSAS is the leading OLAP server globally and is apparently the best selling OLAP server for Oracle among others wth MDX as the de-facto standard for multidimensional querying, I think that Microsoft are doing quite well currently with shaping up the product and with balancing its features.

    Instead of "banning" SSAS if your business and developers had the desire and skills to test it properly they would have noticed the problem and would have come up with a solution - e.g. as Darren suggested - using a data type which can accommodate the larger numbers. This would have had a performance impact but would have worked out in your case.

    It is always between two evils - performance impact in ALL scenarios while checking for overflows, or a problem in a small number of implementations, which can get taken care of with a bigint or some other large data type in the real numbers domain. I would personally prefer what the SSAS product team has done if the choice is as simple as that (based on Darren's assumption).


    Boyan Penev --- http://www.bp-msbi.com
    Thursday, September 9, 2010 11:52 PM
  • Yes, the data in the cube was reconciled with underlying database.  But, when it was built and deployed, they didn't have an entire year's worth of data.  It takes an entire year's worth of data to produce this issue.  No, they didn't dummy up a year or even multiple years of data just to test every possible permutation of aggregate.  They assumed that if SSAS was able to correctly compute an aggregate at the lowest 6 levels of a hierarchy, that it would be able to correctly compute the same aggregate at the top level of the hierarchy.  I found this issue almost a year after these cubes have been in production.  It took me a couple of hours to figure out why it was producing the result that it was.  There isn't ANY way that I can justify producing invalid results.  Wrong is wrong and there is no excuse for invalid calculations.  Period.

    The data type precision ISN'T garbage.  The data type precision ISN'T wrong.  The cube is picking up the data type from the database and there are no problems at all with the data type precision of the database. The vast majority of people building cubes do NOT have extensive experience with either databases or SSAS and Microsoft has made it that way on purpose.  This magnifies another level when you throw PowerPivot into everything, because you now have Microsoft telling people that you don't even need someone in IT. (I sat through one of those presentations at a user's group.)  Business users can just grab the data into PowerPivot, do their calculations, and go on.  PowerPivot is relying on the same calculation algorithms as SSAS that create this problem.

    SSAS requires the data type used for the measure to match the data type defined for the underlying column.  So, you have to either change the data type in the database, use a named query that explicitly CASTs the column to a larger data type, or do a dummy change of the data type for the column within the cube - before you can change the data type for the measure.  While you can do this is SSAS, if you even know that you have to, I'm quite positive that PowerPivot does NOT have that ability and business users are certainly not going to understand that they need to do this even if it is there.  All they are going to get is good data in and garbage out.

    So, now to make sure that none of my students ever have this problem, I now have to add a section into every SSAS class that teaches them to change the data type of every numeric column to the maximum sized data type possible, so that they don't ever have to deal with the fact that SSAS can compute completely invalid results because Microsoft treats data type ranges as a looping mechanism.  I'm also going to have to teach them that if anyone in their company wants to use PowerPivot, then they need to create a set of views that CASTs every numeric data type to the largest data type possible and give their users access to those views so that business users don't get garbage calculations.

    Producing garbage results all in the name of performance is completely pointless.  Would you use SQL Server as a database engine if it stored and retrieved data almost instantly in every case, but you never knew if it was giving you a correct result?  Of course not.  So, why should someone accept SSAS running fast, but never knowing if it ever computes a correct result?


    Mike Hotek BlowFrog Software, Inc. http://www.BlowFrogSoftware.com Affordable database tools for SQL Server professionals
    Monday, September 13, 2010 3:43 PM
  • Okay, fair enough, I realise that it is impossible to test every permutation in a cube etc. and that reconciling at a given level and trusting that AS will behave the same when aggregating further is a fair assumption. Essentially you are testing the inputs rather than testing that AS behaves properly, where you trust that MS have tested the latter for themselves. Clearly that question needed to be asked. I'm sure you understand.

    Beyond that, I think that it is fairly easy to argue that the data type is an input. That fact that it can be changed in BIDS makes it so. Note that BIDS merely makes a best effort to determine the data type, nowhere is there an assertion that it is the correct one. Just as BIDS identifies the primary key based on a clustered key on the relational table, irrespective of whether or not it is actually the primary key. Once again a best effort, though clearly flawed. As with any software you need to know what you, and the software, is doing.

    I think that the MS marketing strategy with regards PowerPivot is probably beyond the scope of this discussion. (i.e. "Is that strategy correct" versus "based on that strategy, the product is flawed".)

    With regards your last paragraph, that raises quite an amusing thought... So, do you use SQL Server? If you do, based on that paragraph you probably shouldn't. Have you ever run the something similar to the following:

    SELECT 5 / 2  AS "incorrect",
        5 / 2.0 AS "correct"
    

    Note how the incorrect column silently produced the incorrect value. Once again, you need to know the software and know that you must provide a scale to an input if you expect to receive scale in the output. This is not true of Oracle which will automatically do this.

    As for the view conversion idea, how is that different to this scenario:

    CREATE TABLE [test_data_type]
    (
     [col1] [int]
    );
    
    INSERT INTO [test_data_type]
    VALUES (2147483647), (2147483647);
    
    SELECT SUM([col1]) AS "col1"
    FROM [test_data_type]
    

     

    Note how that will raise the error "Arithmetic overflow error converting expression to data type int. " and although not silent, still requires a cast to BIGINT in order to get the correct aggregated answer. Essentially any SQL Query or view , providing an aggregated answer from that table will require a cast in order to ensure that the aggregated value can be calculated and presented.

    Ultimately, you always need to know the software you are using. Period. If you don't you get what you are given and like it. There are parts of Excel etc. that baffle me and I fully expect to get incorrect answers if I should venture in to that space. Please note, I am not saying that it is the correct thing to do to ignore arithmetic overflow (or the wrapping effect) but more that in given scenarios it may well be understandable if not forgivable.

     

     

    Monday, September 13, 2010 6:22 PM