locked
Database Design and Introducing Primary and foreign keys RRS feed

  • Question

  • Hi,

    I am designing a SQL database as a project for myself and implementing it within an asp.net website.

    Its a student database with 4 tables: student table, course table, course work and tutor table.

    I have put the databases together with the basics e.g.

    student_table
    Student_User_name nvarchar(50)
    Student_Email_address nvarchar(max)
    Student_forename nvarchar(50)
    Student_surname nvarchar(50)
    Course_start_date (datetime)
    Course (nvarchar max)

    Course_table
    course_name nvarchar(max)

    Tutor_table
    Tutor_Username nvarchar(50)
    Tutor_Email_address nvarchar(max)
    tutor_forename nvarchar(50)
    tutor_surname nvarchar(50)
    course_tutortakes (nvarchar max)

    Course_Work

    studentname nvarchar(50)
    essay nvarchar(50)
    Mark int
    tutor_forname nvarchar(50)
    Tutor_surname nvarchar(50)

    I need to add in a primary key to each table to allow the tables to link.  I would like to link the student database to the course one so I can know which courses the student is taking, and the course one to the tutors, so I know which course the tutor is taking.  I would also like to link the course work table to the students and tutor tables.  How do I do this to make it consistent?  I know I can add in an ID field e.g. studentid which could like, but how can I assign numbers which would be consistent thoughout the tables e.g. if I created a unique studentid how can this id also be reflected in the course_work table?

    How do primary and forgein keys work?  If I have a primary key and forgein key in my table - how are these generated consitently thoughout the schema?

    Is this something that I should be looking at in the asp.net website design e.g. If the user is logged in, then they should supply their username which would in turn provide the primary key.  How would I ensure that the course id  in the students table (if I had one) would be consistent in the course table?

    Thanks in advance

    Thursday, April 5, 2012 10:29 PM

Answers

  • Hi,

    You should probably have  separate stored procedures for the following .

    1. I would think that the site administrator with enter course details and tutor details before the beginning of each . you can either do this as a one time batch load before the start of the semester of provide an admin screen for the webmaster to do this. So the stored proc for this particular task should have an explicit transaction with multiple insert statements, inserting rows in the parent tables , which in your case are the tutor table and the course table. now you enter a row into the tutor-course association table by taking the previously entered tutor name & course name. since the tutor-course relationship is established at this point, you dont have to worry about it anymore when a student signs up for a course. before ending the proc your check for error and commit or rollback the transaction depending on if there were errors or not.

    2. Once the student and course tables have this data, when the student wants to sign up for a class , your can call a stored proc that that also begins an explicit transaction and have multiple insert statements to populate the parent - child tables in the appropriate sequence. So, in this case , your first insert row in student table (You need to check if student already exists, because a student may sign up fr multiple courses !) , the depending on the course selected, you insert the course name & student name into the course work table .  before ending the proc your check for error and commit or rollback the transaction depending on if there were errors or not.

    These steps establish all the relationships you need and the data gets populated, conforming to the relationships you have created. The most important thing to remember when you have PK-FK ( parent - child ) relationships between tables, is that you have to populate the parent table before you can use the key from it to populate the child table. Weather you populate the parent table in advance by a batch/one time script, or populate it via a stored proc depends on the specifics of your application.

    Hope this helps !


    Sanil Mhatre | Database Developer | MCTS | If you find my reply useful in any way, please vote it as helpful. If it has helped answer your question, please mark it as Answer.

    Friday, April 6, 2012 4:18 PM
  • So the dangerous part of this discussion is that you have it in your mind that you need something more than what you have for a proper design. There is a concept of a "surrogate" key that you can add to the table to make things easier, but your tables could be implemented as is (assuming each table represents one thing, which they don't exactly).

    Start with:

    Student

    Student_User_name nvarchar(50)
    Student_Email_address nvarchar(max)
    Student_forename nvarchar(50)
    Student_surname nvarchar(50)
    Course_start_date (datetime)
    Course (nvarchar max)

    This represents a single student, and assuming user name is unique per student (for all times, which is difficult), then you could use the student_user_name as the primary key:

    CREATE TABLE Student
    (  Student_user_name nvarchar(50) CONSTRAINT PKStudent  PRIMARY KEY,

    other columns

    )

    Then the course table

    CREATE TABLE Course
    (
    Course_name nvarchar(100) CONSTRAINT PKCourse PRIMARY KEY --not max!
    )

    And the course_work table, which I assume is the recording of courses taken:

    CREATE TABLE Course_Work
    (
              student_user_name nvarchar(50) NOT NULL,
              course_name nvarchar(100) NOT NULL,
              CONSTRAINT PKCourse_Work PRIMARY KEY (student_user_name, course_name)
    )

    And a foreign key could be created with:

    ALTER TABLE Course_work ADD CONSTRAINT FKCourse_work$references$course 
                  FOREIGN KEY (course_name) REFERENCES course(Course_name)
    ALTER TABLE Course_work ADD CONSTRAINT FKCourse_work$references$student
                 FOREIGN KEY (student_user_name ) REFERENCES student(student_user_name )

    I won't dissect the rest as Sanil did, and honestly there is a lot of work to be done here to get to a really usable database. But I do want to note that for your keys, you can substitute identity generated values for the "real" primary keys to make implementation easier.  So your tables could become like the following, making sure to include uniqueness checks on the columns that you are no longer using as primary keys, since you still want the data to be unique

    CREATE TABLE Student

       Student_id int identity CONSTRAINT PKStudent PRIMARY KEY,
       Student_user_name nvarchar(50) CONSTRAINT PKStudent UNIQUE
    )

    CREATE TABLE Course
    (
        Course_id int identity CONSTRAINT PKCourse PRIMARY KEY,
        Course_name nvarchar(100) CONSTRAINT AKCourse UNIQUE --not max!
    )

    CREATE TABLE Course_Work
    (
              Student_id int NOT NULL,
              Course_id int NOT NULL,
              CONSTRAINT PKCourse_Work PRIMARY KEY (Student_id, Course_id)
    )

    or sometimes people will do (and I would usually):

    CREATE TABLE Course_Work
    (
              Course_Work_Id Int identity  CONSTRAINT PKCourse_Work PRIMARY KEY
              Student_id int NOT NULL,
              Course_id int NOT NULL,
              CONSTRAINT AKCourse_Work UNIQUE (Student_id, Course_id)
    )

    If you want a discussion on which way to go, http://social.msdn.microsoft.com/Forums/en-US/databasedesign/thread/2ba32ac4-9547-43bc-bc2b-e542fa982220

    Personally, I use surrogates, but more for the tools and ease of consistent programming... But use the unique constraints if you go that way or you will be back wondering how to get rid of logically duplicated data.


    Louis

    Friday, April 6, 2012 12:18 AM

All replies

  • Most of your questions can be easily answered by the concepts of database normalization.

    First of all, you have 3 separate objects ;

    1. Student - all the details of the student, and only those details purely dependent on the student should be in this table. So i would define the student table as ; - course & course start date column don't belong here

    student_table

    Student_id INT NOT NULL IDENTITY(1,1) Primary Key
    Student_User_name nvarchar(50)
    Student_Email_address nvarchar(max)
    Student_forename nvarchar(50)
    Student_surname nvarchar(50)

    2. Same concept for course table

    Course_table

    course_id INT NOT NULL IDENTITY(1,1) Primary Key
    course_name nvarchar(max)

      probably the essay column belongs here ..

    3. Same for tutor table

    Tutor_table

    tutor_id INT NOT NULL IDENTITY(1,1) Primary Key

    Tutor_Username nvarchar(50)
    Tutor_Email_address nvarchar(max)
    tutor_forename nvarchar(50)
    tutor_surname nvarchar(50)

    4. Now you create a student_course_association table ; -- This is because the fact that what course a student takes can be defined entirely without involving the tutor !

    student_course_association_id INT NOT NULL  IDENTITY(1,1) Primary Key

    student_id -- FK this to the same column in student table

    course_id -- FK this to same column in course table

    5. Now you create a tutor_course_association table ; -- This is because the fact that what course a tutor teaches can be defined entirely without involving the student!

    tutor_course_association_id INT NOT NULL  IDENTITY(1,1) Primary Key

    tutor_id -- FK this to the same column in tutor table

    course_id -- FK this to same column in course table

    Please follow this link for Database normalization basics - http://databases.about.com/od/specificproducts/a/normalization.htm 

    Please Follow this link for PK basics - http://msdn.microsoft.com/en-us/library/ms181043.aspx 

    Please follow this link for FK Basics - http://msdn.microsoft.com/en-us/library/ms175464.aspx 

    And Finally, here is a link to an excellent article on how to design a relational database for student-tutor-coursework database, with detail example - http://web.mit.edu/11.521/www/lectures/lecture10/lecture10.html

    Hope this helps ! 



    Sanil Mhatre | Database Developer | MCTS | If you find my reply useful in any way, please vote it as helpful. If it has helped answer your question, please mark it as Answer.

    Thursday, April 5, 2012 10:57 PM
  • So the dangerous part of this discussion is that you have it in your mind that you need something more than what you have for a proper design. There is a concept of a "surrogate" key that you can add to the table to make things easier, but your tables could be implemented as is (assuming each table represents one thing, which they don't exactly).

    Start with:

    Student

    Student_User_name nvarchar(50)
    Student_Email_address nvarchar(max)
    Student_forename nvarchar(50)
    Student_surname nvarchar(50)
    Course_start_date (datetime)
    Course (nvarchar max)

    This represents a single student, and assuming user name is unique per student (for all times, which is difficult), then you could use the student_user_name as the primary key:

    CREATE TABLE Student
    (  Student_user_name nvarchar(50) CONSTRAINT PKStudent  PRIMARY KEY,

    other columns

    )

    Then the course table

    CREATE TABLE Course
    (
    Course_name nvarchar(100) CONSTRAINT PKCourse PRIMARY KEY --not max!
    )

    And the course_work table, which I assume is the recording of courses taken:

    CREATE TABLE Course_Work
    (
              student_user_name nvarchar(50) NOT NULL,
              course_name nvarchar(100) NOT NULL,
              CONSTRAINT PKCourse_Work PRIMARY KEY (student_user_name, course_name)
    )

    And a foreign key could be created with:

    ALTER TABLE Course_work ADD CONSTRAINT FKCourse_work$references$course 
                  FOREIGN KEY (course_name) REFERENCES course(Course_name)
    ALTER TABLE Course_work ADD CONSTRAINT FKCourse_work$references$student
                 FOREIGN KEY (student_user_name ) REFERENCES student(student_user_name )

    I won't dissect the rest as Sanil did, and honestly there is a lot of work to be done here to get to a really usable database. But I do want to note that for your keys, you can substitute identity generated values for the "real" primary keys to make implementation easier.  So your tables could become like the following, making sure to include uniqueness checks on the columns that you are no longer using as primary keys, since you still want the data to be unique

    CREATE TABLE Student

       Student_id int identity CONSTRAINT PKStudent PRIMARY KEY,
       Student_user_name nvarchar(50) CONSTRAINT PKStudent UNIQUE
    )

    CREATE TABLE Course
    (
        Course_id int identity CONSTRAINT PKCourse PRIMARY KEY,
        Course_name nvarchar(100) CONSTRAINT AKCourse UNIQUE --not max!
    )

    CREATE TABLE Course_Work
    (
              Student_id int NOT NULL,
              Course_id int NOT NULL,
              CONSTRAINT PKCourse_Work PRIMARY KEY (Student_id, Course_id)
    )

    or sometimes people will do (and I would usually):

    CREATE TABLE Course_Work
    (
              Course_Work_Id Int identity  CONSTRAINT PKCourse_Work PRIMARY KEY
              Student_id int NOT NULL,
              Course_id int NOT NULL,
              CONSTRAINT AKCourse_Work UNIQUE (Student_id, Course_id)
    )

    If you want a discussion on which way to go, http://social.msdn.microsoft.com/Forums/en-US/databasedesign/thread/2ba32ac4-9547-43bc-bc2b-e542fa982220

    Personally, I use surrogates, but more for the tools and ease of consistent programming... But use the unique constraints if you go that way or you will be back wondering how to get rid of logically duplicated data.


    Louis

    Friday, April 6, 2012 12:18 AM
  • Louis , I am totally geeking out because i attended your "Characteristics of a Great Relational Database" session at the PASS summit last year, and that got me hooked to learning about good database design. ! Totally agree with your response to this one , you pretty much not only suggested a great solution, but also helped me understand the details i missed out on !

    Sanil Mhatre | Database Developer | MCTS | If you find my reply useful in any way, please vote it as helpful. If it has helped answer your question, please mark it as Answer.

    Friday, April 6, 2012 1:03 AM
  • In addition to great explanation and solution provided by Louis, having trusted FK can improve performance as well

    http://sqlblog.com/blogs/tibor_karaszi/archive/2008/01/12/non-trusted-constraints-and-performance.aspx


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/

    Friday, April 6, 2012 7:25 AM
  • Dear Louis and Sanil, Thank you both for your replies.  They have certainly got me thinking about what type of keys I need to use within my database.  It is actually tricker than I first thought designing my own database.

    I took Louis advice and created the table suggested above with the foreign keys as I would like to use a natural key rather than a surrogate as it is more meaningful.

    CREATE TABLE Student
    (  Student_user_name nvarchar(50) CONSTRAINT PKStudent  PRIMARY KEY,
    Student_Email_address nvarchar(max),
    Student_forename nvarchar(50),
    Student_surname nvarchar(50),
    Datecreated timestamp
    )

    CREATE TABLE Course
    (
    Course_name nvarchar(100) CONSTRAINT PKCourse PRIMARY KEY
    )

    CREATE TABLE Course_Work
    (  student_user_name nvarchar(50) NOT NULL,
       course_name nvarchar(100) NOT NULL,
      CONSTRAINT PKCourse_Work PRIMARY KEY (student_user_name, course_name),
      courseworkmark int,
    Datecreated timestamp
    )

    ALTER TABLE Course_work ADD CONSTRAINT FKCourse_work$references$course 
                  FOREIGN KEY (course_name) REFERENCES course(Course_name)
    ALTER TABLE Course_work ADD CONSTRAINT FKCourse_work$references$student
                 FOREIGN KEY (student_user_name ) REFERENCES student(student_user_name )

    From this design I can see that the student details need to be entered, and the course needs to be available before data can be entered into the course work table.

    My next question is.. how do I get the data into the databases? 

    e.g. I am planning on creating an asp.net interface for this sql schema design to allow data entry.  I am planning on using stored procedures to do this.  How will the stored procedure work to update the tables at once?  I want one screen for data entry which will allow the student to enter their details, enter their course, and coursework details.   How can I write a stored proceudre to update all table at once and reflect the primary and foreign keys? How do the other tables update with these?  e.g. if a student enters in their details, how does the coursework details table know what data to put into the PK, FK columns?

    Thanks




    • Edited by dev-13 Friday, April 6, 2012 11:31 AM
    Friday, April 6, 2012 11:29 AM
  • Hi,

    You should probably have  separate stored procedures for the following .

    1. I would think that the site administrator with enter course details and tutor details before the beginning of each . you can either do this as a one time batch load before the start of the semester of provide an admin screen for the webmaster to do this. So the stored proc for this particular task should have an explicit transaction with multiple insert statements, inserting rows in the parent tables , which in your case are the tutor table and the course table. now you enter a row into the tutor-course association table by taking the previously entered tutor name & course name. since the tutor-course relationship is established at this point, you dont have to worry about it anymore when a student signs up for a course. before ending the proc your check for error and commit or rollback the transaction depending on if there were errors or not.

    2. Once the student and course tables have this data, when the student wants to sign up for a class , your can call a stored proc that that also begins an explicit transaction and have multiple insert statements to populate the parent - child tables in the appropriate sequence. So, in this case , your first insert row in student table (You need to check if student already exists, because a student may sign up fr multiple courses !) , the depending on the course selected, you insert the course name & student name into the course work table .  before ending the proc your check for error and commit or rollback the transaction depending on if there were errors or not.

    These steps establish all the relationships you need and the data gets populated, conforming to the relationships you have created. The most important thing to remember when you have PK-FK ( parent - child ) relationships between tables, is that you have to populate the parent table before you can use the key from it to populate the child table. Weather you populate the parent table in advance by a batch/one time script, or populate it via a stored proc depends on the specifics of your application.

    Hope this helps !


    Sanil Mhatre | Database Developer | MCTS | If you find my reply useful in any way, please vote it as helpful. If it has helped answer your question, please mark it as Answer.

    Friday, April 6, 2012 4:18 PM
  • Many thanks for your reply Sanil.  I will probably follow what you have suggested for point 1 as it is easy doing the bulk loads especially for one off data imports. 

    What would you suggest for point 2?  I have not written a multiple update stored procedure before, would it be possible for you to show me an example script?

    Many thanks

    Tuesday, April 10, 2012 11:58 AM