Unique keys across an Access database RRS feed

  • Question

  • I am working on a genealogy database.  I have lots of things in various places in the database that need sources ("sources" as in "the source of this person's birthday", not as in "source code").  There can be several sources for any one item.  These items show up in all sorts of columns in all sorts of tables (marriage dates in the families table, birth date in the individuals table, etc.).  Normally, since I need a list of sources for each item, I'd put a indexed key back to the original record in the sources table, but since these items are scattered all over the place, I can't do that.  I decided that I'd make a proxy for any item that has sources (lots of them may not) and use that proxy to find all the sources in the source table that apply to that item.  My first inclination was to make a proxy table and use it's primary (autonum) key as the proxy.  This works and I'm using it for the moment, but having second thoughts.  Really, all I need is a number unique across the database that I can stick in some source proxy field in the original record and use later to locate the source list.  It feels like overkill to produce a whole new, ever expanding table and add new records to it just so I can get a unique autonumber.  I've thought of using guids, but there's a performance penalty there and the time killer will be searching the source table for the unique proxy number which will be faster with and int than with a guid.  I also thought about having a number in a single table which I could read at program start time, cache it in the program and rewrite at shutdown, but if it doesn't get rewritten, there are serious problems so that seems extremely brittle.  Since I have to read it at the start of the program and write it at the end, I can't use transactions to solve the problem.  I could make a table with one field and one record with the current proxy number and rewrite it with n+1 each time I used it.  I could use transactions to make sure that it didn't get changed until it was actually used properly.  I don't think this is an awful solution - probably better than the table I'm using right now - but there may be a better one.

    So my question is, is this the right way to do something like this?  I really hate the table idea I'm using now - it seems wrong to maintain a table just so you can get it's next primary key and have no data in it (that's not true - I couldn't create a table with an autonum key and nothing else so as a kludge had to add a dummy column - just points out more strongly to me how wrong this solution is).  Would guids be better?  The creation of these proxies is not terribly performance sensitive but the searches over them might be which is why I'm avoiding guids though they seem like they would solve the problem nicely from a development point of view.  I've tried finding something out on the web with similar concerns and ideas, but haven't been able to turn up anything.

    Thanks for any ideas!

    Tuesday, March 2, 2010 7:58 AM

All replies

  • Sounds like a classic case of needing a join table to create a many-to-many relationship:

    Source table - Primary Key - SourceID int

    Item table - Primary Key - ItemID int

    SourcesForItem table - Primary key - SourceID + ItemID

    Then to associate a Source to an Item, place a single record in SourcesForItem table.

    You then join all three tables to produce the list of sources for an item.
    Sunday, March 7, 2010 2:05 AM
  • Except that the "items" aren't in a single table.  They're in all sorts of columns in all sorts of tables.  An individual's birthday in the Individual table will have a source and so will the location a marriage takes place in the family table.  Notes in the Notes table will also have sources, etc., etc.. So I haven't got a set of unique IDs from a single table to identify the "items".  Thus, I need to come up with a unique ID which is not just unique within a table but across the database.  Once I have that, it's true that the rest is standard many-to-many relationship which I have working just fine using a link table as you've described.  The question is how best to come up with a unique value across the database.

    GUIDs are certainly unique across a database, but a bit slower than ints so that's one possibility.  I was using a "proxy" table with nothing in it to generatue an AutoNum key which I would then retrieve and use.  That worked but seemed crazy to generate a just table just to autogen keys.  Currently, I've got a table with one value in it - the next database wide source key and I just read it and rewrite n+1 to it each time I want such a key.  I'm looking to see if people more experienced than me at this sort of thing have better ideas.

    Sunday, March 7, 2010 5:36 AM
  • Ok, I see where you are heading.  Every bit of information in the database (every column value in every table) could potentially need a link back to some source details on where you got the information.

    Just shooting from the hip, but could you do something like this?  It would not be entirely relational however:

    Table: ItemSourceMapping
    Columns: ItemSourceMappingId int autonumber primary key, ItemTable text, ItemColumn text, ItemRowId int, SourceId int

    This would presume all of the other tables would use an integer primary key.   Then you could store off these sources but would need to use the table and column names as part of the lookup process which is not relational.  ItemRowId would store primary key from the item table to locate the row.

    Taking this a step further, you could encode ItemTable+ItemColumn into a single entity like so:

    Table: SourceableItems
    Columns: SourceableItemId int autonumber primary key, ItemTable text, ItemColumn text

    Then you have one id to store in ItemSourceMapping versus redundantly storing ItemTable and ItemColumn for each row.  Also you have a way of tracking which items can possible have a source.

    In general I see how you can simplify this by using a master unique guid for primary keys across all tables but this will be difficult to enforce.  Also, you pay a price for this in size and join efficiently in the non-common cases.  
    Sunday, March 7, 2010 4:58 PM
  • That's pretty much what my original idea was.  Eventually I decided that I really needed the many to many mapping, but I only needed to follow that mapping in one direction - from item to source.  So, while a source may serve as the source for several items,  I really don't need to locate those items given the source which eliminates the usefullness of the ItemTable, ItemColumn and ItemRowId columns.  Also, since I may have many sources for one item, I have to have a duplicated key column to use in the original record which identifies the "item".  So my records all have one (or more) "SourceProxy" columns - i.e., SourceProxyBirth, SourceProxyMarriageDate, etc. where I store this database wide unique key.  This allows me to locate all the sources for an item by looking up the sources in the link table using the source proxy ID stored in the record.  It also allows for multiple sources per item and is, I think, a pretty standard many-to-many relation.

    At that point, you've pretty much got exactly what you've come up with, sans the item identification columns - a link table with an IDSourceProxy column and an IDSource column.  The only difference is that IDSourceProxy has to be duplicable in that link table.  If I understand correctly, you were planning on the Autonum column of the ItemSourceMapping table to generate the source proxy IDs, but that can't work since they have be be duplicable in order to allow multiple sources for a single item.  Thus, the need to generate this database wide unique ID for the source proxies.  My original thought was to use a separate table for that similar to yours but without the SourceID.  This was when I really felt like I needed the equivalent of your ItemTable, ItemColumn and ItemRowId columns.  When I decided that I really had no use for them (i.e., I didn't need to locate items given a source) and eliminated them, I was left with a "dummy" table with no purpose other than generating autonums for these proxy IDs which I quickly decided was just dumb.  The table would get arbitrarily large with no purpose other than getting a single new number from it.  Then I thought about GUIDs which are intrinsically unique and don't require an autonum, but I didn't like the performance penalty.  Finally, I decided on the single number table thing which I'm not wild about but it seems to be the best thing my meager mind can come up with.  This doesn't have any effect on join efficiency as far as I can see.  It means I have to read/write this single number table whenever I generate a new proxy, but once the proxy ID is generated, it's standard joins for a many-to-many relationship from there on out.  Since I only generate these proxy IDs whenever the user specifies a new source, that happens at "user speed" so the speed really isn't a huge issue.  I'm dealing with a single file Access database so I'm not going back and forth across the internet to generate this.  I'm new to all this database stuff so I though maybe somebody out there has dealt with this sort of thing and would have a killer idea that I haven't thought of yet.

    Thanks for your thoughts!

    Sunday, March 7, 2010 6:11 PM