locked
Self-referencing tables RRS feed

  • Question

  • Hello -

    I am trying to understand the purpose of self-referencing tables in a large database.

    Isn't that against the rules of Boyce-Codd?  

    Why not use a view to self-reference instead?

    On contract jobs, I have run into this a few times and would like to understand the reasoning behind this kind of architecture.

    Thanking you in advance for your input and advice!

    Juniormint

    Sunday, September 7, 2014 6:48 AM

Answers

  • Traversing a recursive table self reference can be a little confusing to code using T-SQL; especially if you don’t have a defined number of recursions.  Below is the approach we used to find all descendants of any particular role using T-SQL.  The first “insert into” primes the temp table by finding all child roles with a ParentRoleID matching the top level role that we’re interested in.  The second “insert into” populates the temp table with all remaining descendants by looping until no more records are selected.  The second query looks for all children with ParentRoleIDs that match any RoleID in the temp table.  It uses a correlated subquery to prevent previously selected roles from being selected again. 

    See :

    http://blogs.msdn.com/b/securitytools/archive/2009/06/23/traversing-a-recursive-table-self-reference-using-t-sql.aspx?wa=wsignin1.0


    Ahsan Kabir Please remember to click Mark as Answer and Vote as Helpful on posts that help you. This can be beneficial to other community members reading the thread. http://www.aktechforum.blogspot.com/

    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Tuesday, September 9, 2014 4:56 AM
  • I have only seen such design in company's hierarchy workers .... I do not know what is the reason but performance of such design was not so good.. 

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Sunday, September 7, 2014 6:58 AM
  • Let's say you have a persons table with a PK and another field that is referencing the PK that when it's negative means child(of) and when it's positive it means parent(of).

    I don't think you can do multi-level relationship keeping the 'laws'.

    It's also proven that there are situations when they can't be obeyed and this doesn't have to do with modelling.

    http://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form#Achievability_of_BCNF

    So.. break the laws when it doesn't hurt !

    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Sunday, September 7, 2014 7:06 AM
  • As Uri said, the most common scenario with self-referenced tables is supporting hierarchical structures. You can think about company org chart or article/product groups as the example. Self-referencing is not the only way to implement hierarchies; however, it is one of the most common approaches you can see in the wild.

    As with any other approaches, this one has its own set of pros and cons. This design simplifies hierarchy maintenance; however, it is not easy to query. For example, if you need to move one row to another parent; you can simply update ParentId column in the row and you do not need to update children of that parent. However, querying will often require you to use recursion with CTE.

    Another commonly encountered issue is missing index on referencing (ParentId) column in the schema. As with any foreign keys, SQL Server does not require you to create an index on referencing column. As result, any referential integrity support operation (deletion of referenced row) and queries often lead to the scans.

    Two other common approaches for storing hierarchical data are Nested Sets and Materialized Paths. With Nested Sets every row stores left and right bower values. Children bowers are within the parent ones. You can see DDLs for such approach below (one of the possible implementations):

    create table dbo.OrgChart
    (
    	ID int not null,
    	Name nvarchar(64) not null,
    	Title nvarchar(64) not null,
    	LeftBower float not null,
    	RightBower float not null,
    
    	constraint PK_OrgChart
    	primary key clustered(ID),
    );

    Diagram below represents how it look/work.

    Materialized path persist hierarchical path in every node by concatenating information about the parents up to the root of the hierarchy. HierarchyId CLR type uses such an approach persisting path in binary format. Script below shows you the example of DDL

    create table dbo.OrgChart
    (
    	ID int not null,
    	Name nvarchar(64) not null,
    	Title nvarchar(64) not null,
    	Path varchar(256) not null,
    
    	constraint PK_OrgChart
    	primary key clustered(ID),
    );

    Diagram below shows how it works.

    Both, nested sets and hierarchical paths simplifies querying of the data. You do not need recursion here and can use either BETWEEN operator or prefix search to get all children of the parent regardless of number of levels in the hierarchy. However, they add overhead during the maintenance. When you move parent node in the hierarchy, you should either update bower values or path in all subtree/children rows.

    Hope it helps :)


    Thank you!

    Dmitri V. Korotkevitch (MVP, MCM, MCPD)

    My blog: http://aboutsqlserver.com


    Sunday, September 7, 2014 2:04 PM
  • Can you please click on below link

    http://www.codeproject.com/Articles/37926/SQL-code-to-traverse-self-referencing-tables-with


    Ahsan Kabir Please remember to click Mark as Answer and Vote as Helpful on posts that help you. This can be beneficial to other community members reading the thread. http://www.aktechforum.blogspot.com/

    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Monday, September 8, 2014 11:45 AM
  • SQL Server 2005 AdventureWorks database applies self-referencing Employee table to represent the organization chart for the fictional company:

    http://www.sqlusa.com/bestpractices2005/organizationtree/

    SQL Server 2008 AdventureWorks2008 database applies the new hierarchyid data type for the same:

    http://www.sqlusa.com/bestpractices2008/orgchart/

    The hierarchyid data type is a more sophisticated approach than using self-referencing tables.

    hierarchyid is beyond Boyce-Codd.


    Kalman Toth Database & OLAP Architect SQL Server 2014 Design & Programming
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012









    • Edited by Kalman Toth Monday, September 8, 2014 3:30 PM
    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Monday, September 8, 2014 12:09 PM
  • Why would the self referencing table not be boyce-codd? For example:

    EmployeeNumber -- natural primary key

    ManagerEmployeeNumber -- represents an attribute of the employee, notably who the manager is

    There is one key, and the one attribute represents a piece of information about the employee. This table could stand apart from the base Employee table, or could be part of the Employee table. As far a normalization if concerned, I can see no issue.

    This forms a tree hierarchy, aka an adjacency list, and can be processed using a breadth-first relational recursion algorithm, achieving a very good balanced performance for many use cases. Like Dimitri notes, there are other algorithms you can use, but all of them have their ups and downs in either complexity or performance in creating new rows. However, since your question was about normalization, I must note that all of the other methods will definitely fail the normalization test, as the data for each row isn't self contained.

    I have code for each of the types of hierarchy implementation on my website in my presentations page. http://www.drsql.org/Pages/Presentations.aspx. It is in the code download for the How to Optimize a Hierarchy In SQL Server presentation.

    For the adjacency list, the structure of the tree is solely managed by keys (unique and foreign keys), and as such it is the easiest to maintain, and is the most normalization friendly.


    Louis

    Without good requirements, my advice is only guesses. Please don't hold it against me if my answer answers my interpretation of your questions.

    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Tuesday, September 9, 2014 3:07 AM

All replies

  • I have only seen such design in company's hierarchy workers .... I do not know what is the reason but performance of such design was not so good.. 

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Sunday, September 7, 2014 6:58 AM
  • Let's say you have a persons table with a PK and another field that is referencing the PK that when it's negative means child(of) and when it's positive it means parent(of).

    I don't think you can do multi-level relationship keeping the 'laws'.

    It's also proven that there are situations when they can't be obeyed and this doesn't have to do with modelling.

    http://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form#Achievability_of_BCNF

    So.. break the laws when it doesn't hurt !

    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Sunday, September 7, 2014 7:06 AM
  • As Uri said, the most common scenario with self-referenced tables is supporting hierarchical structures. You can think about company org chart or article/product groups as the example. Self-referencing is not the only way to implement hierarchies; however, it is one of the most common approaches you can see in the wild.

    As with any other approaches, this one has its own set of pros and cons. This design simplifies hierarchy maintenance; however, it is not easy to query. For example, if you need to move one row to another parent; you can simply update ParentId column in the row and you do not need to update children of that parent. However, querying will often require you to use recursion with CTE.

    Another commonly encountered issue is missing index on referencing (ParentId) column in the schema. As with any foreign keys, SQL Server does not require you to create an index on referencing column. As result, any referential integrity support operation (deletion of referenced row) and queries often lead to the scans.

    Two other common approaches for storing hierarchical data are Nested Sets and Materialized Paths. With Nested Sets every row stores left and right bower values. Children bowers are within the parent ones. You can see DDLs for such approach below (one of the possible implementations):

    create table dbo.OrgChart
    (
    	ID int not null,
    	Name nvarchar(64) not null,
    	Title nvarchar(64) not null,
    	LeftBower float not null,
    	RightBower float not null,
    
    	constraint PK_OrgChart
    	primary key clustered(ID),
    );

    Diagram below represents how it look/work.

    Materialized path persist hierarchical path in every node by concatenating information about the parents up to the root of the hierarchy. HierarchyId CLR type uses such an approach persisting path in binary format. Script below shows you the example of DDL

    create table dbo.OrgChart
    (
    	ID int not null,
    	Name nvarchar(64) not null,
    	Title nvarchar(64) not null,
    	Path varchar(256) not null,
    
    	constraint PK_OrgChart
    	primary key clustered(ID),
    );

    Diagram below shows how it works.

    Both, nested sets and hierarchical paths simplifies querying of the data. You do not need recursion here and can use either BETWEEN operator or prefix search to get all children of the parent regardless of number of levels in the hierarchy. However, they add overhead during the maintenance. When you move parent node in the hierarchy, you should either update bower values or path in all subtree/children rows.

    Hope it helps :)


    Thank you!

    Dmitri V. Korotkevitch (MVP, MCM, MCPD)

    My blog: http://aboutsqlserver.com


    Sunday, September 7, 2014 2:04 PM
  • Can you please click on below link

    http://www.codeproject.com/Articles/37926/SQL-code-to-traverse-self-referencing-tables-with


    Ahsan Kabir Please remember to click Mark as Answer and Vote as Helpful on posts that help you. This can be beneficial to other community members reading the thread. http://www.aktechforum.blogspot.com/

    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Monday, September 8, 2014 11:45 AM
  • SQL Server 2005 AdventureWorks database applies self-referencing Employee table to represent the organization chart for the fictional company:

    http://www.sqlusa.com/bestpractices2005/organizationtree/

    SQL Server 2008 AdventureWorks2008 database applies the new hierarchyid data type for the same:

    http://www.sqlusa.com/bestpractices2008/orgchart/

    The hierarchyid data type is a more sophisticated approach than using self-referencing tables.

    hierarchyid is beyond Boyce-Codd.


    Kalman Toth Database & OLAP Architect SQL Server 2014 Design & Programming
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012









    • Edited by Kalman Toth Monday, September 8, 2014 3:30 PM
    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Monday, September 8, 2014 12:09 PM
  • Why would the self referencing table not be boyce-codd? For example:

    EmployeeNumber -- natural primary key

    ManagerEmployeeNumber -- represents an attribute of the employee, notably who the manager is

    There is one key, and the one attribute represents a piece of information about the employee. This table could stand apart from the base Employee table, or could be part of the Employee table. As far a normalization if concerned, I can see no issue.

    This forms a tree hierarchy, aka an adjacency list, and can be processed using a breadth-first relational recursion algorithm, achieving a very good balanced performance for many use cases. Like Dimitri notes, there are other algorithms you can use, but all of them have their ups and downs in either complexity or performance in creating new rows. However, since your question was about normalization, I must note that all of the other methods will definitely fail the normalization test, as the data for each row isn't self contained.

    I have code for each of the types of hierarchy implementation on my website in my presentations page. http://www.drsql.org/Pages/Presentations.aspx. It is in the code download for the How to Optimize a Hierarchy In SQL Server presentation.

    For the adjacency list, the structure of the tree is solely managed by keys (unique and foreign keys), and as such it is the easiest to maintain, and is the most normalization friendly.


    Louis

    Without good requirements, my advice is only guesses. Please don't hold it against me if my answer answers my interpretation of your questions.

    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Tuesday, September 9, 2014 3:07 AM
  • Traversing a recursive table self reference can be a little confusing to code using T-SQL; especially if you don’t have a defined number of recursions.  Below is the approach we used to find all descendants of any particular role using T-SQL.  The first “insert into” primes the temp table by finding all child roles with a ParentRoleID matching the top level role that we’re interested in.  The second “insert into” populates the temp table with all remaining descendants by looping until no more records are selected.  The second query looks for all children with ParentRoleIDs that match any RoleID in the temp table.  It uses a correlated subquery to prevent previously selected roles from being selected again. 

    See :

    http://blogs.msdn.com/b/securitytools/archive/2009/06/23/traversing-a-recursive-table-self-reference-using-t-sql.aspx?wa=wsignin1.0


    Ahsan Kabir Please remember to click Mark as Answer and Vote as Helpful on posts that help you. This can be beneficial to other community members reading the thread. http://www.aktechforum.blogspot.com/

    • Marked as answer by Juniormint10 Wednesday, September 10, 2014 2:51 PM
    Tuesday, September 9, 2014 4:56 AM
  • Thank you all for your enlightening responses!
    Wednesday, September 10, 2014 2:52 PM