none
Data Modelling - Supertype Subtype and discriminator

    Question

  • Hello, I have designed a data model that contains a supertype and 3 subclasses. Each subclass contains no more than 2 columns. I now have a problem of efficiently querying - once i have identified the parent how do i know what subtype to join to? I have been reading about the use of a discriminator column within the supertype class but am unsure how to efficiently utilise it in a query. For example, if the supertype is vehicle and i need to decide what subclass to join on do i need to use a case statement against the discriminator? Is this efficient? I couldnt find any tutorials on this, if you could point me in the direction of any please or provide a simple example? Alternatively, given there are so few columns in each subclass is this the best solution? Many Thanks Paul
    Paul
    Tuesday, February 16, 2010 9:42 AM

Answers

  • A slight variation of this approach is to use persisted columns in the sub-tables, to avoid the need to declare and maintain CHECK and DEFAULT constraints on the type differentiator columns: http://blogs.msdn.com/dfurman/archive/2009/08/31/disjoint-subtyping-in-sql.aspx.

    Thursday, February 18, 2010 8:35 PM
  • I think there is no simple solution. In general, in logical modeling we keep the relations as super type - subtype then in physical modeling either merge or create veiw like vwCar,vwMotorCycle etc which join respective subtype and supertype.  I had very similar modeling for a database which was consumed by an app which was using ORM tool. But it also internally does the same thing, it was joining all the subtype if i remember correctly. 

    Madhu
    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/
    Thursday, February 18, 2010 10:28 AM
  • If you need to get a set of all vehicles of a specific type, then it's as simple as adding a filter in the WHERE clause for a specific type differentiator column:
    SELECT VehicleID, ...
    FROM Vehicle
    WHERE VehicleType = 'Car'
    No joins are necessary.

    If you need to do the same thing, but also include some columns in the result set that are specific to a particular sub-type, then you will need to join the corresponding sub-type table using inner join:
    SELECT v.VehicleID, ..., c.Col1
    FROM Vehicle AS v
    INNER JOIN Car AS c
    ON v.VehicleID = c.CarID
    There is no need for the filter in the WHERE clause, the inner join achieves the same purpose.

    If you need to get a set of vehicles of different types, and include columns in the result set that are specific to each type, then you need to use left joins to each sub-type table:

    SELECT v.VehicleID, ..., c.Col1 AS CarCol1, b.Col1 AS BikeCol1, m.Col1 AS MotorcycleCol1
    FROM Vehicle AS v
    LEFT JOIN Car AS c
    ON v.VehicleID = c.CarID
    LEFT JOIN Bike AS b
    ON v.VehicleID = b.BikeID
    LEFT JOIN Motorcycle AS m
    ON v.VehicleID = m.MotorcycleID

    All this on the assumption that your design enforces all applicable constraints, i.e. that there is one and only one related row in all sub-type tables for each row in the super-type table. The approach mentioned by Brian of including type differentiator columns in sub-type tables (either with CHECK and DEFAULT constraints, or using persisted columns) enforces that particular constraint.

    As far as performance of these queries, that will be affected primarily by the indexes that you define to support your specific workload. In my experience, most queries will perform better with this schema as opposed to a denormalized schema where you bundle all sub-types in one table (again, given proper indexing).

    Generally speaking, you want to come up with a correct logical model that accurately represents your given business rules first, and then translate it into a physical implementation. Performance considerations are addressed during that second phase of design.
    Friday, February 19, 2010 7:50 PM

  • possible examples:
    Car
    tire_make1
    Tire_pressure1
    Tire_make2
    Tire_pressure2
    tire_make3
    tire_pressure3
    tire_make4
    tire_pressure4

    Motorcycle and cycle will only have two tires

    Similarly Car and Motorcyle tables will have fields for engines and cycle table will not


    Wouldn't you rather have a table for tires and a table for engines? Even if a bicycle has no engine, you're talking about a null value in the engine field of the bicycle record as opposed to a whole new table repeating a lot of columns (like tires) from the other tables, right?
    Monday, February 22, 2010 10:25 PM

  • possible examples:
    Car
    tire_make1
    Tire_pressure1
    Tire_make2
    Tire_pressure2
    tire_make3
    tire_pressure3
    tire_make4
    tire_pressure4

    Motorcycle and cycle will only have two tires

    Similarly Car and Motorcyle tables will have fields for engines and cycle table will not


    Wouldn't you rather have a table for tires and a table for engines? Even if a bicycle has no engine, you're talking about a null value in the engine field of the bicycle record as opposed to a whole new table repeating a lot of columns (like tires) from the other tables, right?

    Making a new entity for specific artifacts is indeed possible whenever we have a parallel (as you rightly point out). But my post was about providing an example of differences in the detail fields for the subtypes. these diffferences would show up as null fields in an aggregate table.

    Arun MCDBA, MCSE
    Tuesday, February 23, 2010 12:42 AM

All replies

  • Why is there one supertype for three subtypes? If it is a group where three types participate, the subtypes should mention their type and the supertype should not mention it at all.

    It really depends on what the object represents. Please provide an example with a couple supertypes and subtypes of each.
    Tuesday, February 16, 2010 3:20 PM
    Answerer
  • Thanks for the reply Brian. If i use the vehicle example: tblVehicle tblMotorcycle (subtype) tblCar (subtype) tblBicycle (subtype) The is-a relationship exists between tblVehicle and the 3 subclasses. A vehicle can be only one of the three types in this example. If i have a discriminator column in the tblVehicle to identify what sub class it belongs to how do i actually use it in a query? Or i am querying a tblVehicle record, do i need to do a left outer join on all the 3 sub classes to identify what type of vehicle i'm looking at?
    Paul
    Tuesday, February 16, 2010 3:38 PM
  • How is the "can only be in one subtype" enforced?

    I'd suggest adding a second COLUMN defining the rowtype, and throwing a UNIQUE CONSTRAINT on it and the id. Then, in each sub-TABLE, add the same COLUMN as part of the PK, but include a CONSTRAINT forcing it to be the one sub-type that TABLE is for. Then, use an FK from the sub-TABLEs to the super TABLE to enforce uniqueness.

    If the id in the super TABLE is not the PK, it will need its own UNIQUE CONSTRAINT, besides the one with type.

    type
    --
    id
    name
    (m, motorcyle)
    (b, bicycle)
    (c, car)


    vehicle
    ----
    id PK,
    type references type,
    unique(id, type)

    motorcycle
    ---
    id
    type default 'm'
    primary key(id, type)
    check(type = 'm')
    foreign key(id, type) references vehicle(id, type)
     
    car
    ---
    id
    type default 'c'
    primary key(id, type)
    check(type = 'c')
    foreign key(id, type) references vehicle(id, type)

    bicycle
    ---
    id
    type default 'b'
    primary key(id, type)
    check(type = 'b')
    foreign key(id, type) references vehicle(id, type)


    ------------------
    Although, if the sub-TABLEs are all 1-to-1s, i'd take a look at combining them further. Or even combining the sub-tables with each other. As, if they are together on one level (vehicle) and the master table already identifies their type, is there really a reason to store them separately?
    Tuesday, February 16, 2010 4:47 PM
    Answerer
  • My original thought was to de-normalise the database and combine the sub class tables into the vehicle table. Obviouslty this will lead to redundant columns depending on the type of vehicle. For example if it is of type car all motorcycle and bicyle columns will be null. Is this considered good practice? vehicle ---- id type col1 col2 col3 col4motorcycle col5motorcycle col6car etc..
    Paul
    Wednesday, February 17, 2010 8:26 AM
  • Why have 3 tables for your vehicle subtypes? I think a more common way of doing it would be:

    Vehicle
        PK VehicleID
        VehicleType (FK to VehicleTypes)
        Manufacturer
        Model
        Owner
        PurchaseDate
        etc...

    VehicleTypes
        PK TypeName (Motorcycle, Bike, Car, etc)
       
    Wednesday, February 17, 2010 7:01 PM
  • My original thought was to de-normalise the database and combine the sub class tables into the vehicle table. Obviouslty this will lead to redundant columns depending on the type of vehicle. For example if it is of type car all motorcycle and bicyle columns will be null. Is this considered good practice? vehicle ---- id type col1 col2 col3 col4motorcycle col5motorcycle col6car etc..
    Paul

    Can you give a few examples of columns that are "motor cycle only" and "Bicycle only"?
    Wednesday, February 17, 2010 9:23 PM
  • This is just a fictious example to demonstrate my problem. But as my real world problem has so few columns in the sub type tables anyway, i think i will combine them all into the parent vehicle table - as Keith was suggesting. When modelling what is general rule when to go with the supertype - subtype relationship? I assume the amount of different columns in each subtype comes in to it? Thanks
    Paul
    Thursday, February 18, 2010 8:30 AM
  • I think there is no simple solution. In general, in logical modeling we keep the relations as super type - subtype then in physical modeling either merge or create veiw like vwCar,vwMotorCycle etc which join respective subtype and supertype.  I had very similar modeling for a database which was consumed by an app which was using ORM tool. But it also internally does the same thing, it was joining all the subtype if i remember correctly. 

    Madhu
    MCITP, MCTS, MCDBA,MCP-- Blog : http://experiencing-sql-server-2008.blogspot.com/
    Thursday, February 18, 2010 10:28 AM
  • Is the AdventureWorks "Contact" table and tables that subtype off this table a good example of a subcategory relationship?
    Thursday, February 18, 2010 2:39 PM
  • A slight variation of this approach is to use persisted columns in the sub-tables, to avoid the need to declare and maintain CHECK and DEFAULT constraints on the type differentiator columns: http://blogs.msdn.com/dfurman/archive/2009/08/31/disjoint-subtyping-in-sql.aspx.

    Thursday, February 18, 2010 8:35 PM
  • Dimitri, Thanks for your reply, thats a similar example to the vehicle scenario. Product-> book, cd, dvd. Vehicle -> car, motorcycle, bicycle. I'm still trying to understand whats the most efficient way to query this. If you have identified a product or vehicle - how do you identify the subtype it is associated with? Must a left outer join be performed on every subtype table to establish this?? Thanks
    Paul
    Friday, February 19, 2010 8:51 AM
  • If you need to get a set of all vehicles of a specific type, then it's as simple as adding a filter in the WHERE clause for a specific type differentiator column:
    SELECT VehicleID, ...
    FROM Vehicle
    WHERE VehicleType = 'Car'
    No joins are necessary.

    If you need to do the same thing, but also include some columns in the result set that are specific to a particular sub-type, then you will need to join the corresponding sub-type table using inner join:
    SELECT v.VehicleID, ..., c.Col1
    FROM Vehicle AS v
    INNER JOIN Car AS c
    ON v.VehicleID = c.CarID
    There is no need for the filter in the WHERE clause, the inner join achieves the same purpose.

    If you need to get a set of vehicles of different types, and include columns in the result set that are specific to each type, then you need to use left joins to each sub-type table:

    SELECT v.VehicleID, ..., c.Col1 AS CarCol1, b.Col1 AS BikeCol1, m.Col1 AS MotorcycleCol1
    FROM Vehicle AS v
    LEFT JOIN Car AS c
    ON v.VehicleID = c.CarID
    LEFT JOIN Bike AS b
    ON v.VehicleID = b.BikeID
    LEFT JOIN Motorcycle AS m
    ON v.VehicleID = m.MotorcycleID

    All this on the assumption that your design enforces all applicable constraints, i.e. that there is one and only one related row in all sub-type tables for each row in the super-type table. The approach mentioned by Brian of including type differentiator columns in sub-type tables (either with CHECK and DEFAULT constraints, or using persisted columns) enforces that particular constraint.

    As far as performance of these queries, that will be affected primarily by the indexes that you define to support your specific workload. In my experience, most queries will perform better with this schema as opposed to a denormalized schema where you bundle all sub-types in one table (again, given proper indexing).

    Generally speaking, you want to come up with a correct logical model that accurately represents your given business rules first, and then translate it into a physical implementation. Performance considerations are addressed during that second phase of design.
    Friday, February 19, 2010 7:50 PM
  • My original thought was to de-normalise the database and combine the sub class tables into the vehicle table. Obviouslty this will lead to redundant columns depending on the type of vehicle. For example if it is of type car all motorcycle and bicyle columns will be null. Is this considered good practice? vehicle ---- id type col1 col2 col3 col4motorcycle col5motorcycle col6car etc..
    Paul

    Can you give a few examples of columns that are "motor cycle only" and "Bicycle only"?

    possible examples:
    Car
    tire_make1
    Tire_pressure1
    Tire_make2
    Tire_pressure2
    tire_make3
    tire_pressure3
    tire_make4
    tire_pressure4

    Motorcycle and cycle will only have two tires

    Similarly Car and Motorcyle tables will have fields for engines and cycle table will not





    A
    Friday, February 19, 2010 10:27 PM
  • I think Dimitry's suggestion assumes that a type field has been added to your vehicle table. That would take care of identifying the subtype to connect to.

    Overall I think merging the tables into the vehicle table is not a good idea in oltp scenarios particulary when your table is going to have a large number of records. This increases the complexity enormously. For a reporting scenario, merging may prove advantageous
    A
    Friday, February 19, 2010 11:07 PM

  • possible examples:
    Car
    tire_make1
    Tire_pressure1
    Tire_make2
    Tire_pressure2
    tire_make3
    tire_pressure3
    tire_make4
    tire_pressure4

    Motorcycle and cycle will only have two tires

    Similarly Car and Motorcyle tables will have fields for engines and cycle table will not


    Wouldn't you rather have a table for tires and a table for engines? Even if a bicycle has no engine, you're talking about a null value in the engine field of the bicycle record as opposed to a whole new table repeating a lot of columns (like tires) from the other tables, right?
    Monday, February 22, 2010 10:25 PM

  • possible examples:
    Car
    tire_make1
    Tire_pressure1
    Tire_make2
    Tire_pressure2
    tire_make3
    tire_pressure3
    tire_make4
    tire_pressure4

    Motorcycle and cycle will only have two tires

    Similarly Car and Motorcyle tables will have fields for engines and cycle table will not


    Wouldn't you rather have a table for tires and a table for engines? Even if a bicycle has no engine, you're talking about a null value in the engine field of the bicycle record as opposed to a whole new table repeating a lot of columns (like tires) from the other tables, right?

    Making a new entity for specific artifacts is indeed possible whenever we have a parallel (as you rightly point out). But my post was about providing an example of differences in the detail fields for the subtypes. these diffferences would show up as null fields in an aggregate table.

    Arun MCDBA, MCSE
    Tuesday, February 23, 2010 12:42 AM