locked
Storing 1200+ Variables In A Table For A Scientific Study RRS feed

  • Question

  • I have a study in my lab that is starting and this study will have more than 1200 variables that all need to be stored in a database.

    What is the best approach to storing this many variables?

    Is there another aption besides a database  that would be good? XML, etc?


    chuckdawit

    Friday, April 6, 2012 8:33 PM

Answers

  • To add on to what Serg said, you should consider not just how the data will be stored during the study but how the data will be queried during and afterward.  For example, for a study that focuses on effects of a new blood pressure medication, you may want to store core attributes like BP readings and treatments in properly normalized tables with other attributes stored as semi-structured XML using a more flexible schema.  Even with XML you'll still need to give some thought into exactly what each variable means since some will have a single value and others will have multiple (diastolic and systolic values of a single BP reading).


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/

    Saturday, April 7, 2012 12:53 PM
    Answerer
  • My assumption is that you would be gathering these variables to eventually do math/queries/etc on right?  My caution that i would add is that from your description: "things like heart rate, blood pressure, drug infusions over time during surgery, blood values, demographincs, you name it." your probably need to do a good amount of design on how these different variables should be grouped, and probably look at dimensional design (data warehousing), particularly fact tables to see if that isn't a lot better for building a report on. 

    In each case, you will want to build your tables in such a way that every row means the same thing.  So perhaps heart rate and blood pressure can go in a row together naturally.  Drug infusions may stand alone, demographics in their own table.  In dimensional design, you would build fact tables that each had a common grain, and then use a time table of some sort to correlate values and look for patterns (perhaps even data mining).

    What you don't want to end up are rows that are either super generic 1 row per value, or even worse, has 1200 columns that don't all correlate to a single event that can be queried on naturally (unless, of course, your data is organized as periodic events that are grouped "every hour, we give infusions, take vitals, etc)

    What I have said is proably confusing, and that is definitely likely. Without truly understanding the requirements or data,it isn't possible to help nearly enough.  If you want to come back periodically and show us where you are headed, would love to see the progress.


    Louis

    Sunday, April 8, 2012 8:41 PM

All replies

  • What do you mean by variables?  Are these sensor readings taken at regular intervals or other measurements over time.


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/


    Friday, April 6, 2012 9:29 PM
    Answerer
  • things like heart rate, blood pressure, drug infusions over time during surgery, blood values, demographincs, you name it.

    chuckdawit

    Friday, April 6, 2012 10:56 PM
  • It totally depends on your system architecture.

    Most probably you need such entities as OBSERVATION (or TREATMENT or SESSION...), PATIENT, DOCTOR, EQUIPMENT and so on. And those 1200 variables (blood pressure, etc during the time of OBSERVATION or TREATMENT) with appropriate time stamps  may be stored first in an binary blob for example. Second, their aggrergates (max, min, ...) which may be needed to find and retrieve specific  OBSERVATION or TREATMENT  can be stored as a regular fields.

    So define you system architecture first. DB is part of the system and DB design shouldn't be done separatley from  designing the system it supports.


    Serg

    Saturday, April 7, 2012 9:52 AM
  • To add on to what Serg said, you should consider not just how the data will be stored during the study but how the data will be queried during and afterward.  For example, for a study that focuses on effects of a new blood pressure medication, you may want to store core attributes like BP readings and treatments in properly normalized tables with other attributes stored as semi-structured XML using a more flexible schema.  Even with XML you'll still need to give some thought into exactly what each variable means since some will have a single value and others will have multiple (diastolic and systolic values of a single BP reading).


    Dan Guzman, SQL Server MVP, http://weblogs.sqlteam.com/dang/

    Saturday, April 7, 2012 12:53 PM
    Answerer
  • My assumption is that you would be gathering these variables to eventually do math/queries/etc on right?  My caution that i would add is that from your description: "things like heart rate, blood pressure, drug infusions over time during surgery, blood values, demographincs, you name it." your probably need to do a good amount of design on how these different variables should be grouped, and probably look at dimensional design (data warehousing), particularly fact tables to see if that isn't a lot better for building a report on. 

    In each case, you will want to build your tables in such a way that every row means the same thing.  So perhaps heart rate and blood pressure can go in a row together naturally.  Drug infusions may stand alone, demographics in their own table.  In dimensional design, you would build fact tables that each had a common grain, and then use a time table of some sort to correlate values and look for patterns (perhaps even data mining).

    What you don't want to end up are rows that are either super generic 1 row per value, or even worse, has 1200 columns that don't all correlate to a single event that can be queried on naturally (unless, of course, your data is organized as periodic events that are grouped "every hour, we give infusions, take vitals, etc)

    What I have said is proably confusing, and that is definitely likely. Without truly understanding the requirements or data,it isn't possible to help nearly enough.  If you want to come back periodically and show us where you are headed, would love to see the progress.


    Louis

    Sunday, April 8, 2012 8:41 PM