none
Joining a column containing different data types (filtering before joining) RRS feed

  • Question

  • I have a table storing the identifiers of my users in several different applications:

    user_id
    application_id
    foreignkey (varchar)

    All foreign keys are stored in a varchar column, but their actual content can be of different types (string, int, guid, etc. but always the same type per application_id).

    When joining this table, i often need to convert the foreign key to its original type. Although i specify a single application_id in the WHERE clause, MSSQL tries to apply the conversion to all rows in the table and therefore gives me an error.

    Can i somehow achieve that filtering occurs before joining?

    I already tried to do the filtering in a view, but it gives me the same effect.

    Wednesday, March 28, 2012 8:38 AM

Answers

  • You need to use a CASE expression to be sure that you don't get any error:

      CASE WHEN application_id = @appid THEN CASE(foreignkey AS ...) END = otherbl.othercol

    Logically the FROM-JOIN clause is computed first and then the WHERE clause filters the result. In practice, SQL Server recasts computation order, so you don't know when a certain expression is evaluated. The only way to avoid conversion errors is to use CASE, since you are guaranteed that THEN is only created if WHEN is true.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, March 28, 2012 8:56 AM
  • These queries are working for me but I can't say for sure for you.

    CREATE TABLE #MyTable1(a INT, b VARCHAR(10))
    CREATE TABLE #MyTable2(a INT, b VARCHAR(10))
    
    INSERT INTO #MyTable1 VALUES (1, '12')
    INSERT INTO #MyTable1 VALUES (1, '45')
    INSERT INTO #MyTable1 VALUES (2, 'qw')
    INSERT INTO #MyTable1 VALUES (2, 'er')
    INSERT INTO #MyTable2 VALUES (1, '12')
    INSERT INTO #MyTable2 VALUES (1, '451')
    INSERT INTO #MyTable2 VALUES (2, 'qw')
    INSERT INTO #MyTable2 VALUES (2, 'erp')
    
    --DROP TABLE #MyTable1
    --DROP TABLE #MyTable2
    
    SELECT * FROM #MyTable1 mt INNER JOIN #MyTable2 mt2 ON mt.a = mt2.a WHERE mt.a = 1 AND CAST(mt2.b AS INT) = 12
    
    SELECT * FROM #MyTable1 mt INNER JOIN #MyTable2 mt2 ON mt.a = mt2.a and CAST(mt.b AS INT) = CAST(mt2.b AS INT) WHERE mt.a = 1
    
    WITH myCTE1 AS (SELECT mt.a, CAST(mt.b AS INT) AS b FROM #MyTable1 mt  WHERE mt.a = 1),
    		myCTE2 AS (SELECT mt.a, CAST(mt.b AS INT) AS b FROM #MyTable2 mt  WHERE mt.a = 1)
    		SELECT * FROM myCTE1, myCTE2

    Provide your script so that we can help you further. There are other methods like staging the application_id's data in a temp table and then joining the data. It's very important to know what is the size of the table as by this method you are losing the ability to use indexes which may cost you high in performance.


    Welcome to MSDN Forums.
    Feel free to ask your questions and Please Note to Vote all helpful topics and Mark all answering posts.

    You Value Community's efforts and Community values your faith.
    - Arun Kumar Allu

    Wednesday, March 28, 2012 9:05 AM

All replies

  • Try creating a derived query or a CTE with the filtered records, then apply the JOIN with the other table.

    ~manoj | email: http://scr.im/m22g
    http://sqlwithmanoj.wordpress.com
    MCCA 2011

    Wednesday, March 28, 2012 8:46 AM
  • You need to use a CASE expression to be sure that you don't get any error:

      CASE WHEN application_id = @appid THEN CASE(foreignkey AS ...) END = otherbl.othercol

    Logically the FROM-JOIN clause is computed first and then the WHERE clause filters the result. In practice, SQL Server recasts computation order, so you don't know when a certain expression is evaluated. The only way to avoid conversion errors is to use CASE, since you are guaranteed that THEN is only created if WHEN is true.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, March 28, 2012 8:56 AM
  • These queries are working for me but I can't say for sure for you.

    CREATE TABLE #MyTable1(a INT, b VARCHAR(10))
    CREATE TABLE #MyTable2(a INT, b VARCHAR(10))
    
    INSERT INTO #MyTable1 VALUES (1, '12')
    INSERT INTO #MyTable1 VALUES (1, '45')
    INSERT INTO #MyTable1 VALUES (2, 'qw')
    INSERT INTO #MyTable1 VALUES (2, 'er')
    INSERT INTO #MyTable2 VALUES (1, '12')
    INSERT INTO #MyTable2 VALUES (1, '451')
    INSERT INTO #MyTable2 VALUES (2, 'qw')
    INSERT INTO #MyTable2 VALUES (2, 'erp')
    
    --DROP TABLE #MyTable1
    --DROP TABLE #MyTable2
    
    SELECT * FROM #MyTable1 mt INNER JOIN #MyTable2 mt2 ON mt.a = mt2.a WHERE mt.a = 1 AND CAST(mt2.b AS INT) = 12
    
    SELECT * FROM #MyTable1 mt INNER JOIN #MyTable2 mt2 ON mt.a = mt2.a and CAST(mt.b AS INT) = CAST(mt2.b AS INT) WHERE mt.a = 1
    
    WITH myCTE1 AS (SELECT mt.a, CAST(mt.b AS INT) AS b FROM #MyTable1 mt  WHERE mt.a = 1),
    		myCTE2 AS (SELECT mt.a, CAST(mt.b AS INT) AS b FROM #MyTable2 mt  WHERE mt.a = 1)
    		SELECT * FROM myCTE1, myCTE2

    Provide your script so that we can help you further. There are other methods like staging the application_id's data in a temp table and then joining the data. It's very important to know what is the size of the table as by this method you are losing the ability to use indexes which may cost you high in performance.


    Welcome to MSDN Forums.
    Feel free to ask your questions and Please Note to Vote all helpful topics and Mark all answering posts.

    You Value Community's efforts and Community values your faith.
    - Arun Kumar Allu

    Wednesday, March 28, 2012 9:05 AM
  • thank you everybody for the quick responses, i'll give these a try!
    i expect the table to grow up to 10 - 20,000 records
    Wednesday, March 28, 2012 9:32 AM
  • Yes, sql server optimizer is free to move WHERE condition before JOIN or after (evaluate CAST before applying WHERE)

    http://rusanu.com/2011/08/


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/

    Wednesday, March 28, 2012 9:40 AM
    Answerer
  • Good one, Arun.  However, you forgot the ';' before the With.

    ;WITH

    Wednesday, March 28, 2012 5:30 PM
  • If you are willing, I think pivoting out the application_id column and values into a column for each application for which you are keeping track of user IDs might be easier for the optimizer to swallow than having to join to other tables via a CASE statement.  Having them in different columns will also allow you to place referential integrity constraints that not only improve data integrity but also give the optimizer a lot of help in figuring out what you want to do.

    Wednesday, March 28, 2012 5:37 PM
  • This sounds like the easiest solution, but it didn't work for me.

    I'm encountering the problem when using a combination of a subselect and TOP. This is the most simplified reproduction i can offer (the query may not make sense, but it should demonstrate the situation):

    SELECT
    	(SELECT TOP 1 accounts.name
    	FROM ForeignKeys b LEFT JOIN accounts ON b.foreignkey=accounts.id
    	WHERE b.user_id=a.user_id AND b.application_id=a.application_id)
    FROM ForeignKeys a
    WHERE application_id=6
    Replacing the ON condition with a CASE clause didn't change my results.

    • Edited by Doctor Sid Thursday, April 5, 2012 11:29 AM
    Thursday, April 5, 2012 11:25 AM
  • How about first filtering ForeignKeys table into a temp table for the passed application id and then use that temp table in the query? 

    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Thursday, April 5, 2012 6:35 PM
    Moderator
  • i agree that this would work, but consider it a rather 'dirty' (inefficient) solution.
    i solved the problem for now by moving the subquery into a function.
    Friday, April 6, 2012 1:12 PM