none
Inserts with sub-inserts...

    Question

  • Hi

    I'm working with SQLServer 2008R2, enterprise, and a colleague mentioned a (potentially rather complex) insert - type statement...

    Basically, there are three tables, two are lookups with a identity fields - the second is a main table, with fields referencing the lookup identity fields (there's no referential integrity however).

    Data already exists in both lookups and the main table.

    A staging table exists, that contains data that is pushed into the main table, and 'description' fields which are matched to the lookups.

    If we were assured that all the description fields were contained within the lookups, then the insertion into the main table would be a simple case of a join/sub-query to get the relevant IDs in the insert statement from staging to main, however, that is not guarenteed.

    What a colleague has said is that there is a variation of the insert statement that allows new records to be added to the lookup tables if they don't exist, and then the identity field to be returned... - is anyone familiar with that insert type statement, and if so, do you have examples?

    e.g. in psudo code:

    insert into main (ID1, ID2, T_DATE)

    select

          (insert into lookup1 (desc)  raw.desc1 where no records match lookup1.desc=raw.desc1,

           select lookup1.id1 from lookup1 where lookup1.desc=raw.desc1),

          (insert into lookup2 (desc)  raw.desc2 where no records match lookup2.desc=raw.desc2,

           select lookup2.id1 from lookup2 where lookup2.desc=raw.desc2),

          raw.t_date

    from raw

    Thanks


    Carl

    Wednesday, July 10, 2013 11:23 PM

Answers

All replies

  • An single insert statement can only insert into one table (there is an exception - you could use an OUTPUT clause and then your single insert statement can insert rows into two different tables, but that comes with some restrictions (including that the table inserted into by the OUTPUT clause cannot be involved in either side of a foreign key relationship).  But even that only gets you to two tables, not three and the foreign key restriction means it probably doesn't work for you anyway.

    You could do something like create a view, which joined all three tables, put a INSTEAD OF INSERT trigger on the view that examined the inserted pseudo table and did the required inserts to the base tables.  Then you could insert into the view, and let the trigger do the work.  I think that's unnecessarily complex and would not do it.

    So I would do this with three straight-forward insert statements.  That's easy to understand, and easy to maintain.  So the inserts into lookup1 and lookup2 would look like

    Insert Into lookup1(desc)
    Select Distinct r.desc1
    From Raw r
    Where Not Exists(Select * From lookup1 L2 
      Where r.desc1 = L2.desc);
    
    Insert Into lookup2(desc)
    Select Distinct r.desc2
    From Raw r
    Where Not Exists(Select * From lookup2 L2 
      Where r.desc2 = L2.desc);
    
    

    Now you know all the description values are now contained in the lookup tables, so the insert into main is simple.

    Add transaction control and error checking and you are done.

    One thing I would consider doing is putting the code into a stored procedure.  Then all the calling code needs to do is load data into the raw table and call the stored proc.  If the structure of the table ever changes (for example, you add a third lookup table the table uses), you know exactly what code you need to change and where that code lives (in this stored proc).

    Tom
    Thursday, July 11, 2013 12:08 AM
  • >> Basically, there are three tables, two are look-ups with an IDENTITY fields [sic] - the second is a main table, with fields [sic] referencing the look-up IDENTITY fields [sic] (there's no referential integrity however). <<

    Fields are not columns-- totally different concepts. Likewise, rows are not records. This is fundamental RDBMS! Worst of all, IDENTITY is a physical table property, NOT column, that counts insertion attempts (not success). This is a left-over from UNIX and magnetic tape files. It has nothing whatsoever to do with RDBMS and correct SQL. 

    >> Data already exists in both look-ups and the main table. A staging table exists, that contains data that is pushed into the main table, and 'description' fields [sic] which are matched to the look-ups. <<

    It would be nice if we had DDL instead vague narratives. This DDL is required by minimal Netiquette. You also do not understand what a look-up table idiom; look at this skeleton: 

    CREATE TABLE <Something>_Codes
    (<something>_code CHAR(n) NOT NULL PRIMARY KEY
     CHECK (<something>_code LIKE <pattern>),
     <something>_description VARCHAR(n) NOT NULL);

    The code being looked up is usually a fixed length string with a CHECK() constraint for the regular expression and/or check digit. Validation and verification are important and your silly IDENTITY has neither. It is a tag number that mimics the random order of things written on a piece of paper. 

    Notice that no two look-up tables can have the same key by definition. Your “id” is a Kabbalah number; it magically changed from an automobile to a squid, to Lady Gaga as it moves from table to table. Key can be the same pattern, tho. Dewey Decimal and ICD codes look alike, but a library category and a disease are not the same kind of thing. 

    >> If we were assured that all the description fields [sic] were contained within the look-ups, then the insertion into the main table would be a simple case of a join/sub-query to get the relevant IDs [sic] in the INSERT INTO statement from staging to main, however, that is not guaranteed. <<

    Why not? Wouldn't you find it strange if the USPS could not look up a ZIP code? 

    >> What a colleague has said is that there is a variation of the INSERT INTO statement that allows new records [sic] to be added to the look-up tables if they do not exist, and then the IDENTITY field [sic] to be returned... - is anyone familiar with that INSERT INTO type statement, and if so, do you have examples? <<

    Nope; you have to write code for this. SQL and the RDBMS model is set-oriented; a table is set; operations are done all at once because that is how sets work (remember college math?). 

    A stored procedure with two insertions would do it, but it might be better to use one for each of the two different look-up codes. This will let you do the check digits and other validations that are appropriate for that particular code on the raw data. 

    You can find codes that are not in a look-up with a set operation or  MERGE statement. 

    SELECT foobar_code, foobar_description 
    FROM (SELECT foobar_code
            FROM Raw_Something
         EXCEPT
          SELECT foobar_code, foobar_description 
            FROM Foobar_Codes) 
         AS X(foobar_code); 

    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    Thursday, July 11, 2013 2:45 PM
  • you can also use Sub-Select or projection queries to workout.
    Thursday, July 11, 2013 3:52 PM
  • I believe he is refering to example J in OUTPUT.  http://msdn.microsoft.com/en-us/library/ms177564.aspx

    However, I would do what Tom recommends and simply insert the missing rows into the lookup tables first, then simply inner join the lookup tables.

    Thursday, July 11, 2013 3:57 PM
    Moderator
  • Many thanks for the response

    This is crux of what is happening in the procedure at the moment - I was wondering if there's a better way of working it (which is what a colleague mentioned to me)...

    My thoughts were that you would have difficulty inserting into more than one table - I had spotted the 'output' clause but I was wondering if there was a potential wrinkle that could do the job - and that I may have missed it...

    The problem with these (and the code I'm looking at) is that this is a system with data being inserted all the time - and if you're not careful about how you delete data from the staging table, you can delete unprocessed records - in this case, the procedure is inserting the records into lookups as you describe here, does an inner join back between the lookups and raw tables to insert the final IDs into the processed table.  Which means that if you have a new record, with a new lookup value inserted while processing takes place, you can lose records because of the inner join and delete... (admittedly there's a low probability, but it can happen)

    Cheers
    Carl.

    Thursday, July 11, 2013 11:54 PM
  • >>Why not? Wouldn't you find it strange if the USPS could not look up a ZIP code? <<

    Because the procedure is dynamically maintaining the lookups - they arn't a set value - zip codes are a known domain, whereas this table is dealing with an unknown domain.

    Cheers

    Carl.

    Thursday, July 11, 2013 11:56 PM
  • I've been using the merge syntax to handle a situation somewhat like this. the merge statement lets you do inserts, or updates as needed in one pass so you don't have to have if-then logic in your procedure.  You would still need multiple statements for multiple tables but its cleaner and should be faster with merge.

    http://msdn.microsoft.com/en-us/library/bb510625.aspx

    Friday, July 12, 2013 12:15 AM