locked
passing tables and views as paramters to a function RRS feed

  • Question

  • I have a generic table valued function that operates on tables and views. I have several alternative tables and views which can serve as input to the function. I'd like to pass input tables T1 and input view V1 to the function, and then later use the same function on input table T2 and input View V2. T1 and T2 have the same structure but contain different data. same goes for V1 and V2.
    I saw there was a way to pass tables as parameters to functions and SP's but i didn't see a similar option for views.
    So what's the best way of doing this?
    Monday, January 18, 2010 3:45 PM

Answers


  • If you're looking for similar situations in other fields - how about keeping 2009 sales data in one table and 2010 's sales data in another. I'm sure there are many other examples and I don't think its necessarily bad DB design.

    Actually it may be bad design. If the data is kept in one table, then you can just process it with a single stored procedure without the need to pass huge tables as parameters. If scaling is an issue, you can partition the table.

    However, in your case, it maybe ok, since kind of staging/reporting tables generated by a software.

    Since you are dealing with millions of rows, the issue of performance will matter.

    I would just build a hard-wired stored procedure first and optimize it. Then try it on other tables by changing the table names and recompiling. If the process is successful, just continue on this path. In fact, the process can be automated: automatically generate a stored procedure for each table.
    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    • Marked as answer by LShatzman Tuesday, January 19, 2010 10:05 AM
    Tuesday, January 19, 2010 8:14 AM

All replies

  • Hi LShatzman,

    What you saw about passing "tables as parameters" is actually about passing "table variables". A table variable is a variable, but its type is not varchar or smallint, but some table type, and hence its value is not a scalar but a set of rows and columns. You can not use this technique to pass the name of a table and expect that table to be used.

    Standard SQL really does not cater for having variable names in a query or other statement (though there are workarounds - see below). And with good reason - at least 9 out of 10 cases where people use this (or want to use this), the actual problem is incorrect database design. In a well-designed database, each table or view represents information about one entity or one relationship. And (with the exception of administrative tasks), there are very few cases for a generic function that has to perform the same function for customers as it does for payments or for product details. So there's a bug chance that you have come to the right forum with the wrong question. Can you expand a bit on your tables and views and explain what this generic function does?

    The workaround I mentioned before is to use dynamic SQL. However, this will not work in user-defined functions, only in stored procedures. And you should be very aware of the risks involved with using dynamic SQL. I suggest you google for "SQL injection" first before continuing down this road.
    -- Hugo Kornelis, SQL Server MVP
    Monday, January 18, 2010 4:16 PM
  • Hi Hugo,

    I'm running physical eperiments with the output of each experiment comprising several million rows of data. I then variate some of the experiment input parmaeters and rerun it. and so on and son. i thus get many tables with output data with identical structrure and different data - each to be operated on by my function.
    If you're looking for similar situations in other fields - how about keeping 2009 sales data in one table and 2010 's sales data in another. I'm sure there are many other examples and I don't think its necessarily bad DB design.
    In my case SQL injection is not relevant because the DB is not of sensitive material and also not exposed to outside users - and thus there are no security issues
    Tuesday, January 19, 2010 6:20 AM

  • If you're looking for similar situations in other fields - how about keeping 2009 sales data in one table and 2010 's sales data in another. I'm sure there are many other examples and I don't think its necessarily bad DB design.

    Actually it may be bad design. If the data is kept in one table, then you can just process it with a single stored procedure without the need to pass huge tables as parameters. If scaling is an issue, you can partition the table.

    However, in your case, it maybe ok, since kind of staging/reporting tables generated by a software.

    Since you are dealing with millions of rows, the issue of performance will matter.

    I would just build a hard-wired stored procedure first and optimize it. Then try it on other tables by changing the table names and recompiling. If the process is successful, just continue on this path. In fact, the process can be automated: automatically generate a stored procedure for each table.
    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    • Marked as answer by LShatzman Tuesday, January 19, 2010 10:05 AM
    Tuesday, January 19, 2010 8:14 AM
  • If you're looking for similar situations in other fields - how about keeping 2009 sales data in one table and 2010 's sales data in another. I'm sure there are many other examples and I don't think its necessarily bad DB design.
    I know too little about your actual data to comment on that, but the example you give is great, because it's so simple that everyone immediately knows it.

    And though it is, unfortunately, quite common to see seperate table for 2009 sales and 2010 sales, it is in fact a bad, very bad design. For lots of reasons.

    1. The core of the relational model is to record all data in a single, uniform way - in cells in a tabular row/column format. Having seperate tables for seperate years violates that basic principle, since you now have one specific piece of data (the year a sale was made) stored in a completely different way: encoded in the table chosen to store it in.

    2. Another core principle of relational modeling is the correspondence between entities and relationships in the design and tables in the implementation. A sale made in 2010 is not a different entity from a sale made in 2009. They are both occurances of the same entity type: sales. So they should be in a single table.

    3. Yet another core principle for relational databases is that implementation and optimization details are handled at a low level, preferably by the engine itself and in the real world of 2010 also very often by a DBA or a developer - but in those cases, still at a low level. As Kalman Toth already indicates, performance might be a reason to split data over several tables. If you have millions of sales each year, with an obligation to keep sales data for at least 15 years, and 99% of all searches are for sales made in the last 3 months, then you can often gain performance by splitting out the data over several tables. However, and this is a key point that Kalman unfortunately doesn't stress enough, it should still be done as a low level optimization that does not affect the logical model. In technical terms, this means that you would partition the Sales table. This requires the DBA to run some complex commands - and the effect will be that the database still shows you one single Sales table, but the data is actually stored in several seperate tables; and the query optimizer will use the date passed in a WHERE clause to determine which of these tables have to be accessed. So if you have data for 15 years, in 15 seperate tables (each for one year), then a search for all sales in the last two months (that is, from November 19 2009 up until today, January 19 2010) will automatically only access the tables for 2009 and 2010. Without the developer having to write the logic to determine which tables to access, and how to combine the results.

    A partitioned table (or if you are on SQL Server 2000 or older, partitioned view) will give you all the performance benefits of using seperate tables for seperate years without any of the hassle. And if at any time the sales are growing so high that you need seperate tables for each quarter, the DBA can make the change by running some commands in the maintenance window and you have to change nothing in your actual code!

    In my case SQL injection is not relevant because the DB is not of sensitive material and also not exposed to outside users - and thus there are no security issues
    Many injection attacks do not come from the outside, but from the inside - employees! (I think I once read that they are actually the majority of the attackers). And many of those attacks are not of the "lets drop a table and ruin the data in another one bwuhahaha" type, but more of the "lets give my user administrative rights and then raise my salary a bit without anyone noticing" type.
    If you build dynamic SQL from table and column names, than anyone who has the right to create or alter tables (which are, indeed, employees) can attempt to create weirdly named tables to inject code. Here is an example where I first create a table with a very weird name, then show what happens if I use a dynamic select to fetch data from it.
    CREATE TABLE [(select null a)a;print'be glad I did not use shutdown or worse!';--]
       (DemoColumn int NOT NULL PRIMARY KEY);
    go
    INSERT INTO [(select null a)a;print'be glad I did not use shutdown or worse!';--](DemoColumn)
    VALUES (1);
    go
    -- Regular SELECT works:
    SELECT * FROM [(select null a)a;print'be glad I did not use shutdown or worse!';--];
    go
    -- Dynamic SELECT doesn't:
    DECLARE @SQL nvarchar(4000);
    DECLARE @Table sysname;
    -- Assume this comes from a cursor over sys.objects or similar
    SET @Table = (SELECT name FROM sys.objects WHERE name LIKE '%;print%');
    SET @SQL = 'SELECT * FROM ' + @Table + ';'
    PRINT @SQL;
    EXEC (@SQL);
    go
    DROP TABLE [(select null a)a;print'be glad I did not use shutdown or worse!';--];
    go
    
    A real attack would probably include some GRANT or sp_addrole instead of a print. And then, once this has run, the attacker suddenly has administrative rights given to his or her account!

    -- Hugo Kornelis, SQL Server MVP
    Tuesday, January 19, 2010 8:40 PM

  • A partitioned table (or if you are on SQL Server 2000 or older, partitioned view) will give you all the performance benefits of using seperate tables for seperate years without any of the hassle. And if at any time the sales are growing so high that you need seperate tables for each quarter, the DBA can make the change by running some commands in the maintenance window and you have to change nothing in your actual code!
    Hugo,

    Apparently, professional senior DBA-s are in short supply. Abundance of developers who were turned into casual or reluctant DBA-s by their superiors.

    Here is an informal statistics: 90% of the time when I post "Ask your DBA", the response is  "We don't have a DBA".

    I agree with you as far as the design approach concerned. However, given the millions of rows of data spilled out by device used in the experiments, I would go with simple approach unless senior database expert is available.

    Kalman Toth, SQL Server & BI Training, SSAS, SSIS, SSRS; http://www.SQLUSA.com
    Wednesday, January 20, 2010 4:17 PM
  • Apparently, professional senior DBA-s are in short supply. Abundance of developers who were turned into casual or reluctant DBA-s by their superiors.

    Here is an informal statistics: 90% of the time when I post "Ask your DBA", the response is  "We don't have a DBA".

    I agree with you as far as the design approach concerned. However, given the millions of rows of data spilled out by device used in the experiments, I would go with simple approach unless senior database expert is available.
    Hi Kalman,

    I agree with the shortage of DBAs. However, whether professional, casual, involuntary, or reluctant, someone IS doing the jobs of a DBA. That someone should be able to set up a partitioned table. Maybe not directly, but an hour or two of reading and experimenting and maybe a question to one of these forums should suffice to gain sufficient understanding to be able to set up a partitioned table. The commands are not compeltely self-explanatory, but they are not rocket science either. And there are wizards that will help you through setting up the first time (and if you then script and save the script, you can probably get by without the wizards the next time).

    Once the person functionaing as DBA has invested the relatively small amount of time to get the partitioning thing going, you'll find that the partitioned table is EASIER rather than harder then using multiple tables. With a partitioned table, you can use a stored procedure like this:
    CREATE PROC SelectPeriod
        @StartOfPeriod datetime,
        @EndOfPeriod datetime
    AS BEGIN; SELECT FirstColumn, SecondColumn, ThirdColumn FROM YourTable WHERE TheDate >= @StartOfPeriod AND TheDate <= @EndOfPeriod
    AND StatusColumn IN ('A1', 'B2', 'D3'); END;
    And it will always work, with no limitations at all on the period passed into the table, and it will never access more tables than required.
    Just try to imagine how many lines of code you'd need to do this with seperate tables for each year of data, and how many of them would have to be dynamic.

    -- Hugo Kornelis, SQL Server MVP
    Friday, January 22, 2010 7:58 PM