locked
input/output redundancy in table RRS feed

  • Question

  • This is somewhat of a complicated scenario, so hopefully I can explain it clearly.  We use a program model that outputs long monthly forecasts for the future.  We store these forecasts in a table, something like:

     

    CaseID Date Value
    1 1/1/2012 5.4
    1 2/1/2012 10.2

     

    Like I said, this is supposed to be the output from a model simulation.  But sometimes, for particular months, the users input manual values that override the model's output.  So the output of a case is usually a mixture of the model's forecast and manual values.  I store these manual entries in an "input" table since they are necessary to reload a case.  So each case has an (optional) "input" stream of data, and an "output" stream of data.  But for cases that have a lot of monthly value inputs, then there is a lot of redundancy, since the output stream is nearly identical to the input stream.  Is this something that I need to correct, or am I getting too involved in the details?  Am I making sense?

    Tuesday, April 5, 2011 3:04 PM

Answers

  • Do you store an inpute value in separate table on in the same as output value table. If it is the same table you can add CHECK consrtaint to prevent from adding input value the same as output. If you store the data into separate table , so the same functionallity can be achived by using triggers

    create table t100 (id int, [output] int, input int,
    constraint chk_t100_output_input
    check (case when [output]=input then 1 else 0 end =0))

    insert into t100 values (1,100,10)--ok
    insert into t100 values (1,100,100)--failed


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    • Marked as answer by Tekito Tuesday, April 12, 2011 2:04 PM
    Wednesday, April 6, 2011 5:56 AM

All replies

  • Hi,

     

    >>Am I making sense?

    Somewhat :-)

    When you say "stream of data" do you mean two tables:  one holds the model simulation data and on holds the user input data?

    Guessing you don't want to shoe the multiple records for each case -- which one would you like to see?  The most recent?

     

     


    - Doug
    Tuesday, April 5, 2011 7:49 PM
  • I think I'm being too vague.  I'm going to use a hypothetical example to better illustrate.

    Say you had a program that could predict the weather, say the temperature per day.  So, for any given location in the US, you would feed the program certain parameters (latititude, humidity, time of year, etc) and it use a model to output the daily temperature at the location for the next year. 

    However, the program's model isn't perfect.  The meteorologist, knowing better, might choose to override the output temperature for any given day.  So ultimately, a location's forecast will be determined by running the model, then overlaying the result with any manual inputs given by the meteorologist.  If the model says July 10 will be 45 degrees but the meteorologist marked it at 38, then the location output will be 38.  How many days the meteorologist chooses to override the data could vary widely - some locations he might let the model predict most the days, but others he might override almost the entire forecast.

    The database needs to serve two functions.  One, saving enough location data such that it can be retrieved by the program.  That would require saving all the model parameters, as well as any meteorologist manual input.  We do NOT need to save the model output here, because that can be obtained by rerunning the model parameters. 

    Secondly, the database needs to store every location's latest output forecasts (updated everytime someone runs the program).  Again, this output forecast will be a combination of the model output and the meteorologist's manual daily temps.  But, this means that the manual temps now appear twice in the database - once for the purposes of location data, and again in the output table.  If a particular location consists almost entirely of manual temps, then this means a lot of redundant data.   Any more sense?  Is this good or bad database design?  How should I structure things?

     


    Tuesday, April 5, 2011 8:31 PM
  • Do you store an inpute value in separate table on in the same as output value table. If it is the same table you can add CHECK consrtaint to prevent from adding input value the same as output. If you store the data into separate table , so the same functionallity can be achived by using triggers

    create table t100 (id int, [output] int, input int,
    constraint chk_t100_output_input
    check (case when [output]=input then 1 else 0 end =0))

    insert into t100 values (1,100,10)--ok
    insert into t100 values (1,100,100)--failed


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    • Marked as answer by Tekito Tuesday, April 12, 2011 2:04 PM
    Wednesday, April 6, 2011 5:56 AM
  • Uri,

    So are you suggesting two columns, one input and the other output, both nullable and a check that prevents any date entry having two equal values?  Is the implication that if the output value is null, look at the input value to get the output for that date?

    Wednesday, April 6, 2011 1:29 PM
  • Hi Tekito,

    According to Uri’s suggestion, you needn’t to store an input value that equals to the output.  In other words, if the input value is null (The meteorologist doesn’t mark or marks as the same as the output value), it indicates that the input value is the same as the output value. Finally, the output value will like this:

    select (CASE ISNULL(input,0) WHEN 0 THEN [output] ELSE input END) AS [OUTPUT] from t100

    This methord can share the data which is the same on database.

    Best Regards,
    Stephanie Lv

    Tuesday, April 12, 2011 9:38 AM
  • I think I get what you all are saying.  I will give Uri the answer.
    Tuesday, April 12, 2011 2:04 PM