질문 Can static variables do this

  • Friday, June 08, 2012 3:49 PM
     
     

    I have a clr function thats doing some processing against relational data. Lets say I am passing PKA, PKB, ColumnC, ColumnD, ... to the clr function. PKA is the primary key of table a and the data set contains a join to table B which is the child of table A therefore PKB has a double column primary key PKA and PKB. CLR function call will look like this (it always returns multicolumns and 1 and only 1 row)

    select clr

    from TableA a inner join Table B a

    outer apply dbo.CLRFunc(PKA, PKB, ColumnC, ColumnD ... )clr

    Assuming this processes can run in parallel etc can I accomplish this. Store PKA value in a static variable and if PKA changes reset the static variable SV1 and if PKA does not change recalculate the existing SV1 with the input of clr (like SV1 = SV1 + ColumnC * ColumnD) and again when PKA value changes reset SV1.

    The output of clr will be some kind of cumulative that each child of PKA will accumulate and the last child record of PKA will have the result I need what the parent table needs to be modified later.

    All these things I am trying to do since the process I am writing is right now 10+ steps, has to be transactionally safe and quick and I am trying to do it in lesser steps.

    My alternative if this question has no answer is put the results of the query in to a temp table and write a recursive cte against it to accomplish the same task in a second step.

    Thank you


    Gokhan Varol

All Replies

  • Friday, June 08, 2012 6:12 PM
     
      Has Code

    Basically you are performing aggragation within the CLR. In this way, the number of record in output will be the same as the input.

    The static variables are shared between sessions.  one session updating static variables might affect the result in another session and vice versa. So, You will have to consider isolating by using some "session cache" built by yourself to make the data in one session inaccessible to another session.

    few more considerations:

    1. record order in this case is important

    2. if record order could not be guaranteed, you would have to save PKs in memory for potential matches, this might cause application domain unloaded without any errors.

    public class SessionCache<T>
        {
            T[] cache;
            public SessionCache()
            {
                cache = new T[0];
            }
            void Resize(int spid)
            {
                Array.Resize<T>(ref cache, spid + 100);
            }
            public T SetCache(int spid, T data)
            {
                lock (this)
                {
                    Resize(spid);
                    cache[spid] = data;
                    return data;
                }
            }
            
            public T GetCache(int spid)
            {
                lock (this)
                {
                    Resize(spid);
                    return cache[spid];
                }
            }
            public void RemoveCache(int spid)
            {
                Resize(spid);
                cache[spid] = default(T);
            }
        }


    Regards

    John Huang, MVP-SQL, MCM-SQL, http://www.sqlnotes.info

  • Friday, June 08, 2012 6:45 PM
     
     

    Thank you John

    Record order will be absolutely guaranteed. Regular aggregate does not work since I need multiple inputs and multiple outputs unless I do casting multiple variables into a binary max or so to pass into the clr, but I rather not go there.

    Question, will I have a problem if my query does parallel processing in that case would my data split in a way that will break my logic?


    Gokhan Varol

  • Friday, June 08, 2012 7:08 PM
     
     
    good question. Parallel processing will affect the order of the records sending to the CLR but the clr function will NOT be joining to the parallel process. To prove that, you can make your query run in parallel process mode. The function call will(or might) be run as a Calculate Scalar operator, you will find out that the operator will not be appear in any threads ...

    Regards

    John Huang, MVP-SQL, MCM-SQL, http://www.sqlnotes.info

  • Friday, June 08, 2012 7:22 PM
    Moderator
     
     
    Hi Gokhan,
     
    .NET Static variables in SQLCLR are per-appdomain, your assembly is loaded once. Using static variables in SQLCLR requires UNSAFE permission set. The reason for this is, in a multi-user situation, you need to protect your static variables with mutexes, semephores, etc. If the DBA issues a KILL command against your session, any locks taken by you and unresolved will cause a appdomain unload.
     
    Also, I’m not sure what you mean by “record order is guaranteed” by SQL Serve engine does not guarantee row order into a SQLCLR aggregate (or a T-SQL function used as an aggregate or a T-SQL statement, for that matter). The SqlAggregate property that covers the guarantee exists, but it’s marked as “not currently implemented” because of that engine limitation.
     
    Hope this helps,
    Cheers,
    Bob
  • Friday, June 08, 2012 7:26 PM
     
     

    The reason I am saying the record order is guaranteed is I do an top x order by during the clr call

    select * from table a 

    outer apply(select top 10000 *

    from tableb b

    cross apply dbo.clrfunc(b......) cr

    where a.pka = b.pka order by b.pkb ) b

    since b.pka, pkb is primary key clustered order by does not have a cost)


    Gokhan Varol

  • Friday, June 08, 2012 11:37 PM
    Moderator
     
     
    There’s no guaranteed ordering into the clrfunc even if you have an ORDER BY in the statement. And I don’t believe that ordering an intermediate expression in a SQL statement guarantees ordering into the outer (outer apply) statement, depending on how the query optimizer constructs the overall plan. This may appear to work most times but not work other times, especially when parallelism is involved. A primary key doesn’t guarantee order within a SQL statement that uses the table either.
     
    Cheers,
    Bob
  • Friday, June 08, 2012 11:58 PM
     
     

    Thank you bob.

    I thought when you use top or clr there won't be any paralelism, am I off?

    Thank you


    Gokhan Varol

  • Saturday, June 09, 2012 12:02 AM
     
     
    Also I think I'll back out on using static variable therefore unsafe assemblies , mutexes etc, getting complicated for me.

    Gokhan Varol

  • Saturday, June 09, 2012 3:28 AM
    Moderator
     
     
    I see the note in BOL here http://msdn.microsoft.com/en-us/library/ms131046(v=sql.105).aspx for SQLCLR functions that access data. But I thought you’re functions didn’t. Regardless, ordering of intermediate sets isn’t guaranteed.
     
    Cheers,
    Bob
  • Saturday, June 09, 2012 3:37 AM
     
      Has Code

    Thank you Bob

    I was only asking, the functions do not access data. What about TOP clause, I swear I read somewhere the query with a top clause won't run in paralled but google ing it I cannot find the proof to that.

    Actually googleing further I noticed Paul White's blog the query below using top clause does parallel executing, maybe I was wrong 

    http://sqlblog.com/blogs/paul_white/

    WITH
        N1 (n) AS
        (
            -- 10 rows
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1
        ),
        N2 AS (SELECT L.n FROM N1 AS L CROSS JOIN N1 AS R), -- 100 rows
        N3 AS (SELECT L.n FROM N2 AS L CROSS JOIN N2 AS R), -- 10,000 rows
        N4 AS (SELECT L.n FROM N3 AS L CROSS JOIN N3 AS R), -- 100,000,000 rows
        N5 AS (SELECT L.n FROM N4 AS L CROSS JOIN N1 AS R), -- 1,000,000,000 rows
        N6 AS 
        (
            SELECT TOP (1000000) n
            FROM N5
        )
    SELECT 
        COUNT_BIG(*)
    FROM N6
    OPTION (RECOMPILE, MAXDOP 2);


    Gokhan Varol

  • Wednesday, June 20, 2012 6:39 PM
     
     
    TOP forces a serial zone in the plan (notice the TOP iterator is outside the parallel zone in the that query plan).  Nevertheless, Bob was absolutely right to say that the order of rows presented to a SQLCLR component is definitely not guaranteed, whatever one might seek to infer from the execution plan.  Parallelism or not, we cannot depend on ordering to SQLCLR until Microsoft implement the IsInvariantToOrder property, or do something else that provides the guarantee.