Answered by:
Database Design: Web Polls

Question
-
I've had an idea for web site for years now and I've returned over and over to the DB design and constantly get stuck. So I thought I'd give a try here and see what you guys think. Sorry the post is so long... :)
What I need is a database that can handle an unlimited number and variety of polls - like web surveys. Basically a series of questions and answers. So I'd have a few categories, then each category has a few subcategories, each sub has a few polls, each poll has a few questions and each question a few answers. Pretty simple, one-to-many relationships.
The complication comes in when I want to normalize all the different kinds of answers. Some answers might be a text string (like a standard multi-choice test). Some might be a user entered value (like a fill-in-the-blank). Some might be just yes or no. Some might be numbers, or even a range of numbers.
So I could make an Answers table that holds all the "static" answers with an IsCorrect field that would tell me if they chose the right one. That would still mean a Yes and No answer for each Yes/No question and a lot of 1,2,3, etc answers for number questions. And a lot of duplicated ranges like "under 10", "10 - 20", etc. All this is ugly but I can live with it. But I'd also need a second table for fill in the blank answers. That really complicates my reporting and is just a little too ugly for me.
Another solution is to make an Answers table and a Responses table. Each response record would link 1 answer record (if the question required you to select an answer - otherwise just freeform text) to 1 question record and 1 user record. That would avoid all the duplicate answers but I would need some way to indicate that a certain answer is still the correct one for a given question. So if I have 5 Yes/No questions, some Yes, some No, which of the 2 Answer records (the Yes record and the No record) is flagged correct? See what I mean?
Am I trying to normalize too much? There must be some standard way to make sense of all this... Anyone care to try?- Moved by Jonathan KehayiasMVP, Editor Tuesday, December 15, 2009 2:29 AM Perfect Question for new forum. Design away. (From:SQL Server Database Engine)
Monday, December 14, 2009 11:07 PM
Answers
-
Hi Keith,
First, your question:
"Am I trying to normalize too much?"
In my opinion: yes, you are. I'll give you two reasons.
1: Based on your description, I think answers are (wrt their place in the data model) comparable to last names. Sure, there will be duplicates, but in most applications (genealogy excepted), this is not at all relevant. Almost all databases simply store the string "Jones" once for each person in their database named Jones. I would only consider a table of last names and a foreign key from the Persons to the LastNames table if I had to record attributes that functionally depend on the last name, or if I had to keep track of last names, even if nobody in my database currently uses that last name - which probably means that I'd only do it for a genealogy database (and maybe not even there).
2: Think about the usability aspect as well. With a table of common answers, how do you picture the user interface? You can give the user who sets up the poll free input, check if the answer has been used before, then either link to that answer or create a new one to link to - but in that case you'll get lots of "duplicates". Not truly duplicates (that's what the check is for), but different ways to express the same. E.g. "1 - 10", "1 to 10", "From 1 to 10 inclusive", "Less than eleven", etc. are all different ways to express the same range. You could avoid that by presenting all answers already used before in a dropdown, with "Add new answer" as an extra option - but once the application starts getting big, the dropdown will gro to thousands of answers, and how many users will actuallly read them all to choose the right one, instead of quickly adding a new option?
Another comment:
You mention an "IsCorrect" column. This surprises me. You write that the database has to handle polls, surveys. There are no "correct" or "incorrect" answers in polls; they are used to get a feel for how the population at large (or a selected group) feels about issues where there is no correct answer. To the question "Would you become a vegetarian if meat were 40% more expensive?", no answer can be defined as correct or incorrect; it's a personal choice for everyone. I would expect an "IsCorrect" attribute in a database for examinations and other similar tests. "12 - 5 = (a) 6, (b) 7, (c) 8, (d) 9" has only one correct answer, regardless of personal opinion, and if you pick the wrong answer you miss a point for your test.
Finally:
You wrote: "Another solution is to make an Answers table and a Responses table. Each response record would link 1 answer record (if the question required you to select an answer - otherwise just freeform text) to 1 question record and 1 user record."
That would be required in your idea with a generic reusable "Answers" table that every question then links to. Like I already said, that would be overly normalized.
I suggest something like this for the tables "lowest" in the hierarchy:
-- Table used for all questions in a test. -- Questions should be numbered consecutively. CREATE TABLE Questions (TestID SomeDatatype NOT NULL, QuestionNr smallint NOT NULL, QuestionText varchar(200) NOT NULL, -- maybe some other columns as well, PRIMARY KEY (TestID, QuestionNr), FOREIGN KEY (TestID) REFERENCES Tests(TestID), CHECK (QuestionNr) > 0 ); -- Table used for the answer options of multiple choice questions. -- Options should be numbered consecutively. CREATE TABLE AnswerOptions (TestID SomeDatatype NOT NULL, QuestionNr smallint NOT NULL, OptionNr tinyint NOT NULL, OptionText varchar(200) NOT NULL, -- maybe some other columns as well, PRIMARY KEY (TestID, QuestionNr, OptionNr), FOREIGN KEY (TestID, QuestionNr) REFERENCES Questions (TestID, QuestionNr), CHECK (OptionNr) > 0 ); -- Table used to hold respondent's replies. CREATE TABLE Replies (RespondentID SomeDatatype NOT NULL, TestID SomeDatatype NOT NULL, QuestionNr smallint NOT NULL, OptionNr tinyint NULL, ReplyText varchar(200) NULL, -- maybe some other columns as well, PRIMARY KEY (RespondentID, TestID, QuestionNr), FOREIGN KEY (RespondentID) REFERENCES Respondents(RespondentID), FOREIGN KEY (TestID, QuestionNr) REFERENCES Questions (TestID, QuestionNr) FOREIGN KEY (TestID, QuestionNr, OptionNr) REFERENCES AnswerOptions (TestID, QuestionNr, OptionNr), -- Must provide multiple choice option or freeform reply text! CHECK (OptionNr IS NOT NULL OR ReplyText IS NOT NULL), -- Do not provide multiple choice option AND freeform reply text; -- remove this constraint if you want to allow multiple choice + extra text. CHECK (OptionNr IS NULL OR ReplyText IS NULL) );
The most important limitation in this design is that it allows a user to provide freeform text responses to a multiple choice question. There are no straightforward ways to prevent that using constraints. There are some "not-so-straightforward" ways. Let me know if you think you need them - but I can imagine that your interface would not present the user with a freeform text input for a multiple choice question, in which case the need for such constraints might not be high, and not be worth the overhead.
-- Hugo Kornelis, SQL Server MVP- Marked as answer by Keith Work Tuesday, December 15, 2009 9:19 PM
Tuesday, December 15, 2009 8:39 AM -
Hi Keith,
No TestID is a foreign key into the table Tests. I think I should have called it Polls (and thus PollID), but I was thinking about tests because of my misunderstanding of the IsCorrect column in your first post.
I omitted the Tests (or Polls) and the Respondents tables from my suggestion as I thought including them would make the post longer but not clearer. I may have misjudged that; my apologies.
So the idea is that you have a poll, for which some column (or combination of columns) is the primary key. I don't know how many columns (but I assumed one), nor what datatype (that's why I used "SomeDatatype" instead of a real datatype - I tried using italics, but that messed up the formatting of the entire code block). And the same for respondents.
An example to clarify. Let's assume that there is a poll with a primary key value 'Poll32'. This poll has 40 questions which are numbered 1 to 40. So question number 18 lives in the Questions table with a primary key value of {PollID, QuestionNr} = {'Poll32',18}. (Before I forget - Nr stands for Number; IMO a more logical and more international appealing abbreviation than No, and I wanted to use number rather than ID to stress that I use the actual question number rather than a meaningless identity value or other surrogate key).
This question is multiple choice, and has 4 answer options. These are identified by OptionNr (option number) 1 through 4.
So if a respondend with a RespondentID equal to KeithW would choose option 3 for this question, the following data would be inserted in the Replies table:
INSERT INTO Replies (RespondentID, PollID, QuestionNr, OptionNr) VALUES ('KeithW', 'Poll32', 18, 3);
Your last question: Yes, a question for which no answer options exist would be a freeform text question. You could, for performance, store a Y/N column in the Questions table, but that would introducce the risk of inconsistencies if you are not very careful (and with proper indexing, a [NOT] EXISTS subquery in the AnswerOptions table should be very fast)
-- Hugo Kornelis, SQL Server MVP- Marked as answer by Keith Work Tuesday, December 15, 2009 9:19 PM
Tuesday, December 15, 2009 8:42 PM
All replies
-
Are you able to classify the different types of answers? For example, multiple choice, yes/no, free-form, range, and the like? If there are a small amount of answer types, or rather, question types, that itself limits the answers to a manageable set. Even with the categories, there'd be the question of variance. For example, a rating of 1-5 might be more common that an undetermined range. Though, something like that would have to be very common to justify its own category.
Tuesday, December 15, 2009 4:29 AMAnswerer -
What I need is a database that can handle an unlimited number and variety of polls - like web surveys. Basically a series of questions and answers. So I'd have a few categories, then each category has a few
.....
Am I trying to normalize too much? There must be some standard way to make sense of all this
Keith,
I am answering only one aspect of your question: web surveys with radio buttons and checkmarks (pure Boolean data).
Consider the following: a survey page with 12 questions and 5 answers (radio buttons or checkmarks).
Relationally the answers can be stored in 12 rows.
I found a generally much better performing non-3NF way: bit pattern of BIGINT (60 bits) stored in a single row.
Demo follows:
-- T-SQL demo bit pattern storage of answers to multiple choice survey questions -- Start out with 0 mask - no survey selection data DECLARE @SurveyAnswers BIGINT SET @SurveyAnswers = 0 SELECT CAST (@SurveyAnswers AS binary) -- 0x000000000000000000000000000000000000000000000000000000000000 -- HEX pattern is all zero --> binary mask all zeroes also -- Test setting for question 4 answer 3 - 3x5+3 = 18 SELECT @SurveyAnswers & POWER (2, 18) -- 0 -- 0 means answer is not set -- To set question 4 answer 3 SET @SurveyAnswers = @SurveyAnswers + POWER (2, 18) SELECT CAST (@SurveyAnswers AS binary) -- 0x000000000000000000000000000000000000000000000000000000040000 -- Test again setting for question 4 answer 3 - 3x5+3 = 18 SELECT @SurveyAnswers & POWER (2, 18) -- 262144 -- It means answer is set - user selected that radio button or checkmark
Kalman Toth SQL SERVER 2012 & BI TRAINING
New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012- Edited by Kalman Toth Friday, September 28, 2012 7:31 PM
Tuesday, December 15, 2009 7:51 AM -
Hi Keith,
First, your question:
"Am I trying to normalize too much?"
In my opinion: yes, you are. I'll give you two reasons.
1: Based on your description, I think answers are (wrt their place in the data model) comparable to last names. Sure, there will be duplicates, but in most applications (genealogy excepted), this is not at all relevant. Almost all databases simply store the string "Jones" once for each person in their database named Jones. I would only consider a table of last names and a foreign key from the Persons to the LastNames table if I had to record attributes that functionally depend on the last name, or if I had to keep track of last names, even if nobody in my database currently uses that last name - which probably means that I'd only do it for a genealogy database (and maybe not even there).
2: Think about the usability aspect as well. With a table of common answers, how do you picture the user interface? You can give the user who sets up the poll free input, check if the answer has been used before, then either link to that answer or create a new one to link to - but in that case you'll get lots of "duplicates". Not truly duplicates (that's what the check is for), but different ways to express the same. E.g. "1 - 10", "1 to 10", "From 1 to 10 inclusive", "Less than eleven", etc. are all different ways to express the same range. You could avoid that by presenting all answers already used before in a dropdown, with "Add new answer" as an extra option - but once the application starts getting big, the dropdown will gro to thousands of answers, and how many users will actuallly read them all to choose the right one, instead of quickly adding a new option?
Another comment:
You mention an "IsCorrect" column. This surprises me. You write that the database has to handle polls, surveys. There are no "correct" or "incorrect" answers in polls; they are used to get a feel for how the population at large (or a selected group) feels about issues where there is no correct answer. To the question "Would you become a vegetarian if meat were 40% more expensive?", no answer can be defined as correct or incorrect; it's a personal choice for everyone. I would expect an "IsCorrect" attribute in a database for examinations and other similar tests. "12 - 5 = (a) 6, (b) 7, (c) 8, (d) 9" has only one correct answer, regardless of personal opinion, and if you pick the wrong answer you miss a point for your test.
Finally:
You wrote: "Another solution is to make an Answers table and a Responses table. Each response record would link 1 answer record (if the question required you to select an answer - otherwise just freeform text) to 1 question record and 1 user record."
That would be required in your idea with a generic reusable "Answers" table that every question then links to. Like I already said, that would be overly normalized.
I suggest something like this for the tables "lowest" in the hierarchy:
-- Table used for all questions in a test. -- Questions should be numbered consecutively. CREATE TABLE Questions (TestID SomeDatatype NOT NULL, QuestionNr smallint NOT NULL, QuestionText varchar(200) NOT NULL, -- maybe some other columns as well, PRIMARY KEY (TestID, QuestionNr), FOREIGN KEY (TestID) REFERENCES Tests(TestID), CHECK (QuestionNr) > 0 ); -- Table used for the answer options of multiple choice questions. -- Options should be numbered consecutively. CREATE TABLE AnswerOptions (TestID SomeDatatype NOT NULL, QuestionNr smallint NOT NULL, OptionNr tinyint NOT NULL, OptionText varchar(200) NOT NULL, -- maybe some other columns as well, PRIMARY KEY (TestID, QuestionNr, OptionNr), FOREIGN KEY (TestID, QuestionNr) REFERENCES Questions (TestID, QuestionNr), CHECK (OptionNr) > 0 ); -- Table used to hold respondent's replies. CREATE TABLE Replies (RespondentID SomeDatatype NOT NULL, TestID SomeDatatype NOT NULL, QuestionNr smallint NOT NULL, OptionNr tinyint NULL, ReplyText varchar(200) NULL, -- maybe some other columns as well, PRIMARY KEY (RespondentID, TestID, QuestionNr), FOREIGN KEY (RespondentID) REFERENCES Respondents(RespondentID), FOREIGN KEY (TestID, QuestionNr) REFERENCES Questions (TestID, QuestionNr) FOREIGN KEY (TestID, QuestionNr, OptionNr) REFERENCES AnswerOptions (TestID, QuestionNr, OptionNr), -- Must provide multiple choice option or freeform reply text! CHECK (OptionNr IS NOT NULL OR ReplyText IS NOT NULL), -- Do not provide multiple choice option AND freeform reply text; -- remove this constraint if you want to allow multiple choice + extra text. CHECK (OptionNr IS NULL OR ReplyText IS NULL) );
The most important limitation in this design is that it allows a user to provide freeform text responses to a multiple choice question. There are no straightforward ways to prevent that using constraints. There are some "not-so-straightforward" ways. Let me know if you think you need them - but I can imagine that your interface would not present the user with a freeform text input for a multiple choice question, in which case the need for such constraints might not be high, and not be worth the overhead.
-- Hugo Kornelis, SQL Server MVP- Marked as answer by Keith Work Tuesday, December 15, 2009 9:19 PM
Tuesday, December 15, 2009 8:39 AM -
One approach might be to review Duane H's work on setting up survey databases. The interface you're looking at (web) is different, but the underlying data structure he's developed may give you some ideas. If interested, you can see it at:
http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=3
Regards
Jeff Boyce
Microsoft Access MVPTuesday, December 15, 2009 2:52 PM -
Some more explanation about the goal: The idea is that you could come to the site and take a lot of surveys and compare your results with others (by age, region, etc). So some would be opinion polls, some would be "test" polls with a correct answer. Then you can see how your opinions compare to others as well as your knowledge.
Brian:
I did think about grouping the answers into categories for the different types, but you still have to store the different kinds of data in different tables or fields or something. It gets complicated very fast doing that.
SQLUSA:
Yeah I could just store the single answer or the correct answer for each question and not the other choices but then I have to put all the wrong answers at the display level, not the DB level. REALLY ugly...
Hugo:
Your no. 2 point about the users creating new polls/quizzes is a very good one I hadn't thought about. I think you're right there - I'll need to go with one record for each possible answer, for each question. Your solution looks like the one I've thought about using an Answers table and a Responses table.
I could add an "IsCorrect" flag to your AnswerOptions table but some multi-choice questions would just be opinion - no correct answer. others would require a correct answer.
I'd also like to have really fast reporting, that's why I'm pushing to normalize it as much as possible. So I can show a dozen tables on the home page for the most popular polls, etc. As I know very little about data warehousing, I'd like to stick to a single DB for everything if I can. At least for now.
One more thing - In Hugo's design, why would I need a link from Replies to TestID and QuestionNr? Why not just link to the answer and, through that, to the question, then the test?Tuesday, December 15, 2009 4:36 PM -
The more generic a grouping is, the more it can handle, but the harder it is to support. The more specific a grouping is, the easier it is to support each individual type, but it can get unwieldy. The trick is to be a specific as possible without making too many groups.
To do that, the types of quizzes should be looked at. If you have a sample set, count then and get some numbers associated with the various types. Consolidate types based on similarity and rarity of usage. This would give you the types the application will support.
If however, there is no sample, or the sample may not be representative of real usage, or growth is expected to add unknown types, the level of generics should be chosen by a educated guess.
Either way, the design can proceed from there.Tuesday, December 15, 2009 4:48 PMAnswerer -
Jeff:
Nice link - thanks! I had a brief look at his DB and it looks like it's just simple "select the correct answer from a given set" kind of questions. I'll have to look at it in more depth.
Brian:
Yeah maybe it's just my design philosophy but I think a database should function from day one to the parameters you built it for without any more management. Having to constantly monitor, update and categorize questions manually just seems like you're making a lot of work for yourself. I'm open to counter-arguments though. :)Tuesday, December 15, 2009 5:02 PM -
I don;t mean to monitor and change it. I am talking about the model from day one. It's just that sometimes, you have the luxury of actual data at that point, and that can be used to find the generic/specific line.Tuesday, December 15, 2009 5:32 PMAnswerer
-
Hi Keith,
To start with the last question: you need a foreign key to the questions table because OptionNr is an optional column. It can (and will) be left NULL for answers that require a freeform text answer. And if one of the columns in a foreign key is NULL, the constraint is not checked (that is by design and as specified in the ANSI specifications, not a bug).
If all questions would be mutliple choice, then you'd be right and I could leave out the foreign key. Just as I left out the foreign keys to the Tests table in the AnswerOptions and Replies tables.
On the fast reporting, that is not the goal of normailization. You use normalization to get at a design where each fact is stored in exactly one place, so that no inconsistencies can ever occur. A well-normalized design often improves performance as well, but it's not the goal and in some cases not the end result either (there is, after all, a reason that pure reporting databases often deliberately denormalize!).
Over normalizing does in almost all cases result in worse performance. Think of the last names example I gave earlier - the only thing you'd achieve by creating a seperate "LastNames" table is that either (if you use a surrogate key) every report now has to use an extra join to include the last name in human readable form, or (if you use the natural key) the last names are effectively duplicated.
-- Hugo Kornelis, SQL Server MVPTuesday, December 15, 2009 5:38 PM -
I think I need to understand your idea above a little better, Hugo.
By TestID in your example, you mean the QuestionID, right? Not the poll (or test) ID. To me the poll/test has questions and the questions have answers.
And I think I see that ReplyText would hold the freeform text and OptionNr (What does "Nr" stand for?) would hold the answer from the AnswerOptions table. Right?
I'm guessing that when I build the question at run-time, I'd display a free-form text field if the question had 0 associated answer records, right? Or did you have some other way to determine what kind of options to display?
Tuesday, December 15, 2009 6:49 PM -
Hi Kalman,
I have to disagree with you on this one. Sure, you can save some bytes of storage data this way - but disk space is not expensive anymore (though still not really cheap if it has to be SAN quality). And maybe even a table or two.
But at what price? The data becomoes incomprehensible for human inspection, making debugging harder and more error prone. The code becomes much more complex than it would otherwisse be (as you prove in your sample code - I'll come to that in a moment). And you make it impossible for the optimizer to use any indexes that might otherwise be created and used to speed up queries.
To prove the increased chance of errors in code, check your own example. With a well-normalized design, setting the same answer another time is either, depending on the exact model chosen, a no-op (if an UPDATE is used) or will result in an error and no harm done (if INSERT is used). In your case, you'll add POWER(2,18) to the value that holds the encoded answers another time, setting it to POWER(2,19) - so now the data says that the user selected answer 4 for question 4. (Or answer 1 for question 5 - that part remains vague even with the example).
Another example to illustrate how error prone this is. Imagine a poll with 4 multiple choice questions. The first has 3 options, the second and third each have 5, and the last question has 4 options. How much time does it take you to work out the correct power of 2 for testing if someone chose the 2nd answer to question 4?
And then, test takers give me the feedback that question 2 lacks an option they'd want to choose. For the benefit of future respondents, I add this as a sixth option to question 2. What will this do to the already registered replies?
Finally (and then I'll give it a rest) - getting results. With a well-normalized design, it just takes a simple query with a GROUP BY and some aggregates in the SELECT clause to get an overview of how the respondents answer to each question (like: Q1 - 33% option 1, 50% option 2, 17% option 3 / Q2 (etc)). How would that query look in your model?
-- Hugo Kornelis, SQL Server MVPTuesday, December 15, 2009 8:29 PM -
Hi Keith,
No TestID is a foreign key into the table Tests. I think I should have called it Polls (and thus PollID), but I was thinking about tests because of my misunderstanding of the IsCorrect column in your first post.
I omitted the Tests (or Polls) and the Respondents tables from my suggestion as I thought including them would make the post longer but not clearer. I may have misjudged that; my apologies.
So the idea is that you have a poll, for which some column (or combination of columns) is the primary key. I don't know how many columns (but I assumed one), nor what datatype (that's why I used "SomeDatatype" instead of a real datatype - I tried using italics, but that messed up the formatting of the entire code block). And the same for respondents.
An example to clarify. Let's assume that there is a poll with a primary key value 'Poll32'. This poll has 40 questions which are numbered 1 to 40. So question number 18 lives in the Questions table with a primary key value of {PollID, QuestionNr} = {'Poll32',18}. (Before I forget - Nr stands for Number; IMO a more logical and more international appealing abbreviation than No, and I wanted to use number rather than ID to stress that I use the actual question number rather than a meaningless identity value or other surrogate key).
This question is multiple choice, and has 4 answer options. These are identified by OptionNr (option number) 1 through 4.
So if a respondend with a RespondentID equal to KeithW would choose option 3 for this question, the following data would be inserted in the Replies table:
INSERT INTO Replies (RespondentID, PollID, QuestionNr, OptionNr) VALUES ('KeithW', 'Poll32', 18, 3);
Your last question: Yes, a question for which no answer options exist would be a freeform text question. You could, for performance, store a Y/N column in the Questions table, but that would introducce the risk of inconsistencies if you are not very careful (and with proper indexing, a [NOT] EXISTS subquery in the AnswerOptions table should be very fast)
-- Hugo Kornelis, SQL Server MVP- Marked as answer by Keith Work Tuesday, December 15, 2009 9:19 PM
Tuesday, December 15, 2009 8:42 PM -
Thanks Hugo. I'll give it a shot and let you guys know how it works.Tuesday, December 15, 2009 9:19 PM
-
Sorry for late replay. I was lookint at this thred yestrday but I did not find time to make an replay. I will give you a screen shoot of my idea, how I see solution of PoolDB. This is only draft with basic concept and it can be upgraded in any aspects. If you need I can later explain details about it. Hop it helps
Jasmin- Proposed as answer by Alex Prusakov Wednesday, December 16, 2009 1:48 PM
Wednesday, December 16, 2009 7:49 AM -
Thanks Jasmin. Are you thinking the free-from answers would be stored in usersAnswers.text? Then you shouldn't need a link between usersAnswers and Pool (which I assume to mean "Polls"), right? And you only need usersAnswers.QuestionID because it may be free-from, in which case it has no relationship to Answers, right?
Why the QuestionTypes though? Don't like the NOT EXISTS subquery idea?Wednesday, December 16, 2009 4:35 PM -
hi ,
intresting ! some links that might be of help and give ideas , i like the Jasmin idea , i was thinking that Microsoft Research did something like it a while ago .
http://www.wddb.com/
http://www.smartwebby.com/web_products/flash_poll/AccessInstallationMulti.asp
http://www.smartwebby.com/web_products/flash_poll/SQLServerInstallationMulti.asp
have a nice day and happy holidays
http://www.microsoft.com/security ++ http://www.microsoft-hohm.com/default.aspx ++ http://www.getpivot.com/ ++ http://photosynth.net/ ++ http://seadragon.com/ ++ http://windowsteamblog.com/blogs/genuinewindows/default.aspxWednesday, December 16, 2009 7:52 PM -
Thanks Jasmin. Are you thinking the free-from answers would be stored in usersAnswers.text? Then you shouldn't need a link between usersAnswers and Pool (which I assume to mean "Polls"), right? And you only need usersAnswers.QuestionID because it may be free-from, in which case it has no relationship to Answers, right?
Why the QuestionTypes though? Don't like the NOT EXISTS subquery idea?
Is the relationship between userAnswers and Pool a transitive relationship? Likewise isn't the relationship between userAnswers and answers also a transitive relationship? Maybe not -- it might be just like you said.Wednesday, December 16, 2009 8:17 PM -
I'm not a DB master or anything but by transitive, I guess you mean the answer depends on both the question and the poll? I don't think it should. Each question is for a specific poll only.
Then again, I guess I could have questions that fit into more than one poll. "On what island did Darwin do his evolutionary research?" That could be Geography, History, Science... hmm...
I think that's getting too complicated. Assume that each question belongs to one and only one poll.Wednesday, December 16, 2009 8:46 PM -
I'm not a DB master or anything but by transitive, I guess you mean the answer depends on both the question and the poll? I don't think it should. Each question is for a specific poll only.
Then again, I guess I could have questions that fit into more than one poll. "On what island did Darwin do his evolutionary research?" That could be Geography, History, Science... hmm...
I think that's getting too complicated. Assume that each question belongs to one and only one poll.
No, by transative relationship I mean that
when A -> B and B -> C, it follows that A ->C. There is no need to explicity maintain a formalized FK from A to C. Therefore the FK between is A and C is not needed.
Help please? Am I off on this?Wednesday, December 16, 2009 9:09 PM -
Your description of a transitive relationship looks correct to me, Kent.
From Jasmin's diagram, the relationship between userAnswers and Questions is unnecessary, as the userAnswers table (uA) has a FK relationship with the Answers table (A), and the Answers table has a FK relationship to the Questions table (Q). So, uA->A, and A->Q, so uA->Q.
This transitive relationship is enforced by the FK between A and Q - each answer must belong to a specific question, and each userAnswer relates to a single Answer, so there is no need to store the QuestionID along with the userAnswer - you'd be duplicating the QuestionID attribute. So, not only do you not need the relationship between uA and Q, you don't need the QuestionID attribute at all in uA.
As the transitive relationship implies the values for QuestionID and PoolID in the userAnswers table, repetition of these attributes will violate normal form.Wednesday, December 16, 2009 11:04 PM -
Below is the partial model of a survey system design done some time ago. This is one of the first drafts but will give you some ideas.
The idea is to treat each question as a table. For example, a single choice question can be presented as one table with one row and one column. Multiple choice question can be represented as table with one column and multiple rows. A range type question can be presented as table with two columns (start/end range) and one row. In the model below items reference table rows and details reference table columns in the questions.
The data type entry validation is enforced by regular expressions at the business layer.
This survey is designed to handle more complex scenarios. One of them is a jump map, which represents when a particular answer will trigger skipping number of questions. For example, if question #5 in section #2 asks if the survey participant is a decision maker, and if the answer is no then this answer can trigger skipping the next 5 questions which related only to decision makers.
CREATE TABLE Surveys ( survey_id INT NOT NULL CONSTRAINT pk_surveys PRIMARY KEY, survey_name VARCHAR(100) NOT NULL, title NVARCHAR(300) NOT NULL, instructions NVARCHAR(2000) NULL, begin_date DATETIME NULL -- date when the survey becomes active and available to users CONSTRAINT ck_surveys_begin_date CHECK (begin_date = DATEADD(DAY, DATEDIFF(DAY, 0, begin_date), 0)), end_date DATETIME NULL -- date when the survey becomes inactive and not available to users CONSTRAINT ck_surveys_end_date CHECK (end_date = DATEADD(DAY, DATEDIFF(DAY, 0, end_date), 0)), CONSTRAINT ck_surveys_begin_end_date CHECK (begin_date <= end_date)); CREATE TABLE Participants ( participant_id INT NOT NULL CONSTRAINT pk_participants PRIMARY KEY, company_id INT NOT NULL CONSTRAINT fk_participants_companies FOREIGN KEY REFERENCES Companies (company_id) ON UPDATE CASCADE ON DELETE CASCADE, participant_name NVARCHAR(100) NOT NULL, email NVARCHAR(180) NOT NULL); CREATE TABLE SurveySections ( section_id INT NOT NULL CONSTRAINT pk_surveysections PRIMARY KEY, survey_id INT NOT NULL CONSTRAINT fk_surveysections_survey FOREIGN KEY REFERENCES Surveys (survey_id) ON UPDATE CASCADE ON DELETE CASCADE, section_name NVARCHAR(200) NOT NULL, instructions NVARCHAR(1000) NULL, sequence INT NOT NULL); CREATE TABLE Questions ( question_nbr INT NOT NULL CONSTRAINT pk_questions PRIMARY KEY, question_type_cd CHAR(2) NOT NULL -- Multiple choice, Single choice, etc. CONSTRAINT fk_questions_type FOREIGN KEY REFERENCES QuestionTypes (question_type_cd) ON UPDATE CASCADE ON DELETE CASCADE, question_text NVARCHAR(500) NOT NULL, question_title NVARCHAR(100) NULL, instructions NVARCHAR(1000) NULL); CREATE TABLE QuestionItems ( item_nbr INT NOT NULL CONSTRAINT pk_questionitems PRIMARY KEY, question_nbr INT NOT NULL CONSTRAINT fk_questionitems_question FOREIGN KEY REFERENCES Questions(question_nbr), sequence INT NOT NULL, item_text NVARCHAR(200) NOT NULL, CONSTRAINT uk_questionitems UNIQUE (question_nbr, sequence)); CREATE TABLE QuestionDetails ( detail_nbr INT NOT NULL CONSTRAINT pk_questiondetails PRIMARY KEY, question_nbr INT NOT NULL CONSTRAINT fk_questiondetails_question FOREIGN KEY REFERENCES Questions(question_nbr), detail_title NVARCHAR(100) NOT NULL, sequence INT NOT NULL, detail_type CHAR(1) NOT NULL CONSTRAINT df_questiondetails_detail_type DEFAULT 'T' CONSTRAINT ck_questiondetails_detail_type CHECK (detail_type IN ('T', -- character data, free text 'N', -- numeric data 'C', -- currency 'D', -- date 'B', -- boolean Yes/No (checkbox)
'S', -- single choice option (radio button) 'L', -- list data, could be rank or other list (combo/drop-down list) 'R', -- range 'P') -- percentage, the column should total to 100, each value can be only between 0 and 100 ), regex_validation NVARCHAR(100) NULL, regex_error_message NVARCHAR(200) NULL, CONSTRAINT uk_questiondetails UNIQUE (question_nbr, sequence, detail_nbr)); CREATE TABLE AnswerDetails ( answer_detail_id INT NOT NULL CONSTRAINT pk_answerdetails PRIMARY KEY, answer_id INT NOT NULL CONSTRAINT fk_answerdetails_answer FOREIGN KEY REFERENCES Answers(answer_id), item_nbr INT NOT NULL CONSTRAINT fk_answerdetails_item FOREIGN KEY REFERENCES QuestionItems(item_nbr), detail_nbr INT NOT NULL CONSTRAINT fk_answerdetails_detail FOREIGN KEY REFERENCES QuestionDetails(detail_nbr), answer_value NVARCHAR(200) NULL, -- value for answer, needs to be verified based on detail_type update_user NVARCHAR(35) NOT NULL, -- user/admin name who last updated the answer update_date DATETIME NOT NULL -- date of update CONSTRAINT df_answerdetails_update_date DEFAULT CURRENT_TIMESTAMP, CONSTRAINT uk_answerdetails UNIQUE (answer_id, item_nbr, detail_nbr)); CREATE TABLE QuestionJumpMap( section_nbr INT NOT NULL, question_nbr INT NOT NULL, item_nbr INT NOT NULL, detail_nbr INT NOT NULL, answer_value_to_match NVARCHAR(200) NOT NULL, next_section_nbr INT NULL, next_question_nbr INT NOT NULL, CONSTRAINT pk_questionjumpmap PRIMARY KEY (section_nbr, question_nbr));
Plamen Ratchev- Proposed as answer by Yookos Friday, December 6, 2013 1:57 PM
Thursday, December 17, 2009 12:19 AM -
Hi Jasmin,
Your model looks like a good start, but I do have some comments. I think some of them have already been mentioned by others, but for my overview I'll include them here anyway.
1. Surrogate keys .
Every table in your design has a single-column primary key with tablename ID as its name. I think you intend these to be IDENTITY/AutoNumber surrogate keys. That is not incorrect per say, but you should never have a surrogate key as the only key - and in this case, you have exactly that. That is bad for a number of reasons. One is that you can't really prevent duplicates this way. If a poll designer accidentally clicks "ADD" twice after keying in a question, two identical copies of the question will be in the database. It's also bad from a user perspective. The surrogate keys should not be exposed to end users (sure, you do not want a poll respondent to be confronted with question #87236, having answer options #34782378, #34782380, #34782381, and #34782385). But you do need to give the user a way to identify a particular question and answer, which will naturally also be a candidate key in the table.
As long as each surrogate key is accompanied by another key (I guess you should call it the user key, as it's the key to be used by the users), and the user key is properly enforced with a UNIQUE or PRIMARY KEY constraint, using them can be okay. But they should not be added out of habit to every table; it should be a conscious decision. Using a surrogate key adds to the storage space used for the table, requires at least one extra index to be built, stored, maintained upon modiifications, and considered by the query optimizer, and often introduces extra joins to many queries. That can be an acceptable trade off, but you have to make a considered choice. Is the table referenced anywhere? If no, there is no need at all for the surrogate; you're just being wasteful. If it is referenced, the next thing to consider would be the amount of columns and the datatypes used by the user key, and the amount of changes that can be expected in this key versus the overall load of the database (as cascading a key change to referencing tables will cause locking and blocking, which can be problematic on databases with very high loads). In your database design, I think that PollID and UserID are good examples of cases where a surrogate key is a good choice (but do add UNIQUE constraints for UserName and PollName!). The others will cause slowing. (And one of them even makes it harder to protect against violation of business rules!)
2. Transitive relationships
The relationship between usersAnswers and poll is redundant. Every userAnswer is related to a question, which in turn is related to a poll. Adding this relationship even introduces a chance of inconsistencies - the model now allows me to store a userAnswer that relates a poll 17, and to question 8466, which relates to poll 60. Removing the relationship from usersAnswers to poll (and the PollID column in usersAnswers) fixes this.
The same appears to hold for the relationships from usersAnsers to answers and from usersAnswers to questions, but this situation is different. If not all questions are multiple choice, then AnswerID in usersAnswers will be a nullable column, making the relationship to answers 0-n instead of 1-n. We still need to maintain the relationsip to the questions table for non-multiple choice questions, so the relationship to questions needs to stay - it is transtive for SOME rows, but not for ALL.
3. Potential integrity violation
Since the "sometimes transitive" relationship from usersAnswers to question and the relationships that make it transitive are, in your model, all maintained through surrogate keys, you introduce a chance of data corruption that can not be prevented with standard constraints. The model allows me to store this collection of rows (for brevity, I include only the relevant tables and columns):
INSERT INTO questions (QuestionID, PollID) VALUES (12345, 123); INSERT INTO questions (QuestionID, PollID) VALUES (23456, 234); INSERT INTO answers (AnswerID, QuestionID) VALUES (1234567, 12345); INSERT INTO answers (AnswerID, QuestionID) VALUES (2345678, 23456); INSERT INTO usersAnswers (QuestionID, AnswerID) VALUES (12345, 2345678);
All constraints are satisfied, and yet the result is garbage - I answered question 12345 with one of the options of question 23456!
If you check out my suggestion of a few days ago, you'll see that there are many similarities to your proposal, but that I did not use any surrogate keys. And that because of the use of user keys, this anomaly is preventeed automatically (because the questions key value is part of the answers key value).
If you do want to keep the surrogates (though I really see no reason for it), then a workaround would be to make the relationship between usersAnswers and questions optional (0-n) as well, by making column QuestionID in usersAnswers nullable, and adding a constraint to ensure that for each row in usersAnswers, exactly one of QuestionID and AnswerID is populated and the other is NULL. But this will make many reports a tad harder to produce!
-- Hugo Kornelis, SQL Server MVPThursday, December 17, 2009 11:58 AM -
Plamen:
Thanks. I'll have to run it and then make a diagram. It's MUCH easier to understand visually (for me). And I hadn't thought about building the testing methodology into the DB as well (like skipping questions based on answers). Could be valuable.
Hugo:
Good stuff on using surrogate keys. I've always used those as just the easiest way to reference a record but you bring up very valid arguments for using "real" keys. I'm wondering though - if the real keys are so much larger than an int, wouldn't it impact performance? Seems like your indexes would be a lot bigger.Thursday, December 17, 2009 3:33 PM -
Hugo:
You mean the indexes on the table? Well, that depends on what you choose as the clusted index; this might even be a nonunique column (eg a date column that is often used in date range selections). Too many people assume that, just because SQL Server defaults to using a clustered index to support a PRIMARY KEY, this ought to be the case.
Good stuff on using surrogate keys. I've always used those as just the easiest way to reference a record but you bring up very valid arguments for using "real" keys. I'm wondering though - if the real keys are so much larger than an int, wouldn't it impact performance? Seems like your indexes would be a lot bigger.
The datatype IS an issue for referencing tables, though. That's why I already indeicated that this is one of the factors to take into consideration when making the tradeoff. If the business key is varchar(200) with an average length of 100, and there are 5 referencing tables each holding 5 million rows - then you use A LOT of space to store all the references. Space you would have saved when you had used an integer surrogate key.
Unfortunately, most cases are not so clear-cut as this one. Those are the most interesting choices ;)
-- Hugo Kornelis, SQL Server MVPThursday, December 17, 2009 4:13 PM -
As I said before, this is only draft idea of poolDB that I created in very short time, so its normal that we have a lot place for improvement this concept and idea in general. I will try to explain myself and to make answer to other member of this thread. Because this is very live topic I am sorry if I forget someone.
Just one note before. When I design an database I always see it in while environment (business logic, GUI and business process in information system in general). So some ideas are not “visible” in physical layer without whole picture.
Also I am big fun of middle way in creating relation model of system. If database is TO normalize then is difficult “read” and hard to create upper level in coding process. (SELECT on 5-6 tables just to get simple data on big systems are performance killers). On the other side putting whole thing in small amount of tables (male life easier for programmers) and create: redundancy, inconsistency and other ugly staff form DB world is just not right. My concept is to tray make peace between those two worlds. So sometime I “break” rules of normalization because whole staff will breathe easier and vise versa. I believe that this is not just my concept and that a lot of people do if in the same way. So enough with philosophy J
@Keith W
Free-form answers are stored in usersAnswer.Text attribute, and answers.Text is Boolean type that is set to TRUE if question required free text answers. I look this from GUI level. When you chose a free form text, then drop down list is pulled from questionTypes. Also answers.Text is set to TRUE an much easier handled with one WHERE condition from programmer point of view. I know that part answers and userAnswers in free-from answer is subject of discussing and you have a point. But in my case I want to have everything about answers in one place. Because that I use answers.Text Boolean field.
QuestionTypes, I just like to use simple referent tables to keep consistency. (Singe choice, multiple choice, free- form….)
@Kent Waldop, JumMcLeaod, Hugo Kornelies
Yes it is transitive relationship, no question about it. I can always get poolID back trought question. But that is JOIN of three tables. Nothing wrong also with that, but It can be and It will be slower then just picking up PoolID from usersAnswers. (This is one of the places where I make some “violation” of normalization rules. Don’t ask me why I did to that. In my experience there was a situation (almost identical) where I needed to de-normalize tables to solve performance issues.
“…If a poll designer accidentally clicks "ADD" twice after keying in a question, two identical copies of the question will be in the database. It's also bad from a user perspective...“
Hugo, Yes you are right with this one. I belive that one composite key will solve issue.
I must say that I disagree about surrogates key and opinion that they are wrong choice. I use IDENTITY/AutoNumber surrogate keys very often. Only when business rules required something special (let say natural key). Tomorrow I will start one thread “Surrogates key vs. others” J where I will try to be more descriptive J
JasminThursday, December 17, 2009 7:14 PM -
I augmented Hugo's web survey (poll) design; RadioCheckText is response type of Radio Button, Checkmark or text entry. Database diagram follows:
>
Kalman Toth SQL SERVER 2012 & BI TRAINING
New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012- Edited by Kalman Toth Friday, September 28, 2012 7:32 PM
Saturday, December 19, 2009 8:12 AM -
Thanks Kalman. That looks almost exactly like mine. One question though - when you have compound keys (like the Question table does), do you have to carry those key fields down to the related tables too? I see you have so I assume that's a requirement. Just verifying - you can't make a single field refer to a compound key?
Also I ended up leaving out the OptionNr (the numbering will be handled at run-time - I might want to randomize them or select 4 from a pool of 10, etc) so I used a surrogate key on the AnswerOption table. That make sense?Monday, December 21, 2009 2:35 PM -
The more I thought about the real vs. surrogate issue, the more it made sense for me to use the surrogate keys. I'm just not familiar enough with the compound keys to use them confidently. So here's what I have. Any glaring mistakes?
You can see this has quite a few more features (warm up questions, timed tests, flags for user-submitted content, etc) but the basics are the same we've talked about.
Oh and disregard the ResponseTypeID. That was a holdover from another design idea.
Tuesday, December 22, 2009 7:15 AM -
Since a question can exist on multiple tests/surveys, I suggest breaking out the QuestionID/SurveyID to a separate many<->many table.
"You cannot do a kindness too soon because you never know how soon it will be too late." -Ralph Waldo EmersonTuesday, December 22, 2009 7:54 AMAnswerer -
Thanks Kalman. That looks almost exactly like mine. One question though - when you have compound keys (like the Question table does), do you have to carry those key fields down to the related tables too? I see you have so I assume that's a requirement. Just verifying - you can't make a single field refer to a compound key?
Also I ended up leaving out the OptionNr (the numbering will be handled at run-time - I might want to randomize them or select 4 from a pool of 10, etc) so I used a surrogate key on the AnswerOption table. That make sense?
First question: In the relational model, relationships are implemented by storing the key values of the referenced table in the referencing table. So yes, if the key of the referenced table spans multiple columns, then the same amount ot columns (having the same data types) is required in the referencing table.
Second question: Numbering the answers 1, 2, 3, and 4 when entering the poll's questions and answers does not automatically imply that they need to presented with those numbers and in that order by the UI. They can be, but they can also be presented in a random order, or sorted alphabetically, by length, or by a checksum of the ascii values of the first 5 non-numeric characters in the answer. And they cna be labbeled 1, 2, 3, and 4; or a, b, c, d, or not labeled at all (if you have check boxes or radio buttons in front of them). That is a UI design question which is hardly related to the database design.
Of course, using a surrogate key is not wrong (assuming you include a UNIQUE constraint on the "real" key as well). I just don't see the advantage here.
-- Hugo Kornelis, SQL Server MVPTuesday, December 22, 2009 12:07 PM -
The more I thought about the real vs. surrogate issue, the more it made sense for me to use the surrogate keys. I'm just not familiar enough with the compound keys to use them confidently. So here's what I have. Any glaring mistakes?
You make it sound as if this is an all or nothing decision. It is not. Some tables may indeed benefit from surrogates, others won't. Why not make the decision on a table by table basis? I can see some point in some of your tables having surrogates, but others? For instance, the Responses table is net even referenced by any other table, so why would you include ReponseID?
You can see this has quite a few more features (warm up questions, timed tests, flags for user-submitted content, etc) but the basics are the same we've talked about.
Oh and disregard the ResponseTypeID. That was a holdover from another design idea.
On a related note, how will you enforce that for a multiple choice question, the AnswerID in Responses is actually related to the the same question that the response is related to?
I'm not really sure what the intention of MinQuestions and MaxQuestions in the Surveys table is, nor what the idea is of the MaxAnswers column in Questions. And the whole Flags table, with all references to it, is beyond me as well.
The most glaring mistake I see, though, is that each table has a surrogate key and no other key ! A surrogate should always be backed by a real key, to prevent duplicated data entry and to give the user something more user-friendly than "question 237231 of test 2134 has answer options 89237623, 89237624, 89237626, and 89237630". But maybe that's just a shortcoming of the tool you use to create the diagrams; if you did define UNIQUE constraints but they were not included in the diagram, then this obviously doesn't apply (though I can't check if I think I agree with your choice).
-- Hugo Kornelis, SQL Server MVPTuesday, December 22, 2009 12:18 PM -
Hi Arnie,
I think that would be overnormalizing, as already mentioned in the start of this discussion.
Some issues you would run into are:
a) It's not practical. Suppose the web site Keith builds turns out to be the next Google, with thousands of people using it. If you enter a test, would you first check a list with millions of already existing questions to see if it's already included, or would you just enter it? Even a list with a few hundred existing questions would be too much for most people. The many to many relationship will, in practice, become a one to many; thereby compllicating the design with no additional gain.
b) It's not enforceable. Yes, you can put a UNIQUE constraint on the QuestionText column. But that will not prevent duplicates. Until SQL Server learns to understand English, it will consider "What is your favorite color" and "Name your favorite color" as distinct values. Heck, even "What is your favorite color" and "What is your favorite colour" are not considered duplicates.
c) It's not as easy as is sounds. You, Keith and me might all want to include "What is your favorite color" as a question in one of our surveys. But I want to let the user choose between red, yellow, and blue; you want to present the options white, black, yellow, blue, orange, and cyan; and Keith prefers to present this question as an open question.
d) There might be issues with data ownership. Suppose Microsoft uses Keith's site for an exam about the new version of SQL Server, and this exam is already being prepared while the product is under development. How do you think thhey'd feel if competitors would be able to see questions about new features that are not even officially annnounced yet?
-- Hugo Kornelis, SQL Server MVPTuesday, December 22, 2009 12:31 PM -
Arnie:
A question only applies to one survey.
Hugo:
For me, it's a lot easier for every table to be uniform and have a single ID column. Again it's just a matter of my inexperience with using multiple column keys and kinda being a creature of habit. I don't really see how you reference a unique answer, for example, by it's text combined with the question text and not have a huge amount of data flying around. The surrogate keys just simplify things for me and I'm familiar with taking the extra steps to maintain them (like making other columns unique, which I have done).
The responses table is related to Questions through a direct relationship (for the open text answers) and through the Answers table for multiple choice like we've been talking about, right?
The Flags table is for user submitted surveys so other users can flag it as inappropriate, incorrect, etc. I assume that would happen at the question level. They can also leave a comment associated with the flag. If a question gets flagged more than [MaxFlags] times, then Questions.Active is set to false. And because questions can be removed by flagging, I need to know what's the minimum number of questions to keep the survey active, thus Surveys.MinQuestions.
The MaxQuestions and MaxAnswers are for those situations where I may have 10 answers to a given question but only want to present 5 of those - a way of randomizing the test. Same thing for surveys - I may have more questions than I want to use to randomize it.
Make sense?Tuesday, December 22, 2009 2:39 PM -
Thanks Kalman. That looks almost exactly like mine. One question though - when you have compound keys (like the Question table does), do you have to carry those key fields down to the related tables too? I see you have so I assume that's a requirement. Just verifying - you can't make a single field refer to a compound key?
Keith,
The Composite FK structure must reflect the Composite PK stucture, otherwise you get an error:
Msg 8139, Level 16, State 0, Line 6 Number of referencing columns in foreign key differs from number of referenced columns, table 'Reply'. Msg 208, Level 16, State 1, Line 1
I updated the diagram above. I added OptionNr to Reply Composite PK.
Kalman Toth SQL SERVER 2012 & BI TRAINING
New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012- Edited by Kalman Toth Friday, September 28, 2012 7:33 PM
Tuesday, December 22, 2009 3:47 PM -
For me, it's a lot easier for every table to be uniform and have a single ID column. Again it's just a matter of my inexperience with using multiple column keys and kinda being a creature of habit. I don't really see how you reference a unique answer, for example, by it's text combined with the question text and not have a huge amount of data flying around. The surrogate keys just simplify things for me and I'm familiar with taking the extra steps to maintain them (like making other columns unique, which I have done).
Inexperience and habit are bad guidelines for a design. Inexperience can be fixed by trying, maybe in some sandbox database, until you get experienced. And habits can be broken in the same way. Not every habit is bad, but not every habit is good either.
I don't think you should use question text and answer text as keys. I don't even think they are candidate keys, since different questions can have the same text. (The questions "What is your favorite color" and "What color traffic lights signals you to stop" will both have the option "Red" among the answers, but that doe not make them the same answer).
If I were a user of your site, creating a survey with 40 questions, then I'd expect to be able to identify these questions with a number of 1 through 40. And when I return to the site, to check results or to amend a question, I expect to be able to use THE SAME numbers to identify THE SAME question - even if the question may be presented to the test takers and survey respondents in a different order. So the natural key for a question would be the question number in combination with whatever the key for a question is. Assuming XXX identifies a survey, "question 14 within survey XXX" should identify one single question. And by the same reasoning, "answer option 2 for question 14 within survey XXX" should identify a single answer to that question. Those are the natural keys, and since end users will want to use them, you should include them in the data model. And expose them instead of the surrogate keys. Whether it then makes sense to include the surrogate kys at all - well, I already covered that in my previous replies.
The responses table is related to Questions through a direct relationship (for the open text answers) and through the Answers table for multiple choice like we've been talking about, right?
Yes. But in the case of a multiple choice question, will AnswerID and QuestionID in the Responses table both be populated?
If yes, then you have introduced the risk of inconsistency. The QuestionID in the Responses table might be different from the QuestionID in the answer the respons references. Makes sense? Otherwise, I'll put make an example in my next reply.
And if not, if you leave QuestionID at NULL if AnnswerID is populated, then you have just created an ideal situation for test makers. A question to seperate the good from the great query writers: create a query that combines some columns from Responses with some columns from Questions. Challenging, because the relationship is either direct, or using the Answers table as an intermediary.
It's not wrong, but the use of surrogate keys in this case does add extra complexity. Depending on implementation choice, the complexity is either a set of triggers to prevent inconsistency, or more complex querying.
The Flags table is for user submitted surveys so other users can flag it as inappropriate, incorrect, etc. I assume that would happen at the question level. They can also leave a comment associated with the flag. If a question gets flagged more than [MaxFlags] times, then Questions.Active is set to false. And because questions can be removed by flagging, I need to know what's the minimum number of questions to keep the survey active, thus Surveys.MinQuestions.
The MaxQuestions and MaxAnswers are for those situations where I may have 10 answers to a given question but only want to present 5 of those - a way of randomizing the test. Same thing for surveys - I may have more questions than I want to use to randomize it.
Make sense?
Yes, definitely. One error here - MaxFlags is not an attribute of the entity type flag, but one of the entity type Questions. And since you can also flag comments, there should also be a maximum number of flags a comment may have somewhere - either in the comment, or (if it's the same for all comments in a survey) in Surveys. And maybe it's even better to keep the two distinct kinds of flags (question flags and comment flags) as seperate entities.
-- Hugo Kornelis, SQL Server MVPTuesday, December 22, 2009 5:01 PM -
Actually, I disagree on some elements of your counter, Hugo.
Overnormalization, perhaps, and then again, maybe not. Depends upon the expected usage, where the design evolves.
Currently, there is nothing in the design that deals with item 'ownership of a 'Survey/Test' or the 'Items'. I would expect that is needed, and is an oversight. I could imagine that not being able to maintain 'control' and ownership of test items would be a significant barrier to market acceptance. Many people and organizations will consider their 'items' to be intellectual property, and perhaps even 'sensitive' material.
Once 'ownership' is determined, allowing a 'owner' to recycle his/her items only makes sense. I ofter create exams for my university classes that will frequently re-use items from previous exams. I could easily imagine frustration with not being able to access my previous items to include on a new 'survey/test'.
If 'ownership' is extended to surveys and items, then the issues of controlling for uniqueness, language similarites/dis-similarities belongs to the owner of the sets of items -not the system.
And of course, some sets of questions may be 'owned' by the system and subsequently available to and used on many surveys. I know when I create a survey for ConstantContact, SurveyMethods, or any of the other 'Survey' vendors that I occasionally use, I can select from a list of categories, and then from existing questions in those categories -OR, I can create my own that no one else has access to. My questions do not show up on the public lists.
This design, so far, has serious limitations to be a highly viable web project -especially if it needs to be revenue positive.
"You cannot do a kindness too soon because you never know how soon it will be too late." -Ralph Waldo EmersonTuesday, December 22, 2009 7:14 PMAnswerer -
Depends upon the expected usage
Hi Arnie,
As you see, I trimmed the quote down to one single phrase - the one that says it all. Your counter example was great; it could not have been more convincing.
The one big danger that I saw when this forum started is that designer will start coming here, expecting answers to their questions. Fine on the one hand, as that is exactly what a forum is all about. But completely wrong at the same time, as the only one who can really answer the question is the intended end user and/or his/her representative. Who, unfortunately, tends to misunderstand the questions a designer asks, which in turn results in the designer misunderstanding the answers.
Keith, you have now seen some great (I hope) examples and arguments on why questions should not be reused in multiple tests, and some great examples and arguments why they should. You are the only one able to figure out which arguments and examples suit your environment better. Talk with your emplyee, or (better) with the workers who will actually be using the database you're about to develop. That is the part of database design that we can not help you with.
Good luck!
-- Hugo Kornelis, SQL Server MVPTuesday, December 22, 2009 7:36 PM -
Yeah those are good points Arnie but, although it may appear otherwise, I'm not trying to compete with sites like Survey Monkey or other do it yourself surveys. I'm more interested in the user "charting themselves" on population graphs, seeing where they really stand politically compared to historical figures, etc. It's more about taking the surveys/tests and comparing results than about creating them for your own personal use.
I could see someone interested in the kind of demographic data this would give me and maybe that would be a lucrative path to go down one day but for now this is just a pet project I've been tossing around. User created content, flagging, rating etc is all just a means of boosting the site content without me personally having to type each word.
So yeah, it really depends strongly on your goals for the data and I think I've got a pretty good handle on those. Thanks for the help guys. :)Tuesday, December 22, 2009 10:57 PM