none
Tsql query to Lookup Prefix based best match help

    Question

  • Hi All

    I have some raw data to be processed and I have no idea how to do look up based on the prefix to find the best match.

    There are 2 tables Area and Customer

    I need to match the Customer ID with the leading Prefix on the area Table to find the best match.

    Eg: mark on Customer table has Id: 1234908974 which has the only possible match for 1234 - London on the Area table.

    I need an output table like below.

    Thanks a lot in advance.

    create  TABLE Customer (Name Varchar(20), CustomerID int)
    GO
    -- Inserting Data into Table
    INSERT INTO Customer (Name,CustomerID) VALUES('John','1234501111')
    INSERT INTO Customer (Name,CustomerID) VALUES('Mat','1234109874')
    INSERT INTO Customer (Name,CustomerID) VALUES('Sam','1234207485')
    INSERT INTO Customer (Name,CustomerID) VALUES('Jake','1234205241')
    INSERT INTO Customer (Name,CustomerID) VALUES('Daniel','1234309632')
    INSERT INTO Customer (Name,CustomerID) VALUES('Mark','1234908974')
    Go
    
    
    
    create  TABLE Area (Prefix int, Area Varchar(20))
    GO
    INSERT INTO Area (Prefix, Area ) VALUES('1234','London')
    INSERT INTO Area (Prefix, Area ) VALUES('123410','Manchester')
    INSERT INTO Area (Prefix, Area ) VALUES('123420','York')
    INSERT INTO Area (Prefix, Area ) VALUES('123430','Cardiff')
    INSERT INTO Area (Prefix, Area ) VALUES('123440','Bath')
    INSERT INTO Area (Prefix, Area ) VALUES('123450','Berby')
    Go

    Thursday, February 13, 2014 3:53 PM

Answers

  • SQL Server 2000 is not supported by MS, I believe.

    In any case, you may re-write this query this way:

    SELECT C.* ,P.* FROM Customer C INNER JOIN Area P ON CAST(C.CustomerID AS VARCHAR(20)) LIKE CAST(P.Prefix AS VARCHAR(20)) + '%' WHERE LEN(cast(P.Prefix as varchar(20))) = (select max(LEN(cast(A.Prefix as varchar(20)))) from Area A

    where exists (select 1 from Customer C1

    where C1.CustomerID = C.CustomerID AND CAST(C1.CustomerID AS VARCHAR(20))

    LIKE CAST(A.Prefix AS VARCHAR(20)) + '%'));



    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    • Proposed as answer by Gert-Jan Strik Thursday, February 13, 2014 6:49 PM
    • Marked as answer by xXShanXx Friday, February 14, 2014 2:52 PM
    Thursday, February 13, 2014 6:24 PM
    Moderator

All replies

  • I am sure there must be better ways but bellow may works:

    Select name,customerid,prefix,area
    from (
    Select *,ROW_NUMBER() over (partition by Name Order by len(replace(c.CustomerID,a.Prefix,''))) row
    From Customer C cross apply
         Area a ) data
    where row=1

    Thursday, February 13, 2014 4:17 PM
  • Hi xxshanxx try this...

    SELECT a.Name, a.CustomerID, ISNULL(b.Area, c.Area) AS Area
    FROM Customer a
    LEFT JOIN Area b ON CONVERT(VARCHAR(6), b.Prefix) = LEFT(CONVERT(VARCHAR(10),a.CustomerID),6) 
    LEFT JOIN Area c ON CONVERT(VARCHAR(6), c.Prefix) = LEFT(CONVERT(VARCHAR(10),a.CustomerID),4) AND b.AREA IS NULL

    Please mark as answer, if this has helped you solve the issue.

    Good Luck :) .. visit www.sqlsaga.com for more t-sql code snippets and BI related how to articles.


    • Edited by sqlsaga Thursday, February 13, 2014 5:08 PM
    Thursday, February 13, 2014 5:07 PM
  • Hi Teherul673

    Thanks - these 2 tables reside in different DBs but on the same server.

    It throws out error  : The multi-part identifier "x.name" could not be bound.

    Thursday, February 13, 2014 5:42 PM
  • Try

    USE tempdb
    
    CREATE TABLE Customer (
    	NAME VARCHAR(20)
    	,CustomerID INT
    	)
    GO
    
    -- Inserting Data into Table
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'John'
    	,'1234501111'
    	)
    
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'Mat'
    	,'1234109874'
    	)
    
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'Sam'
    	,'1234207485'
    	)
    
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'Jake'
    	,'1234205241'
    	)
    
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'Daniel'
    	,'1234309632'
    	)
    
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'Mark'
    	,'1234908974'
    	)
    GO
    
    CREATE TABLE Area (
    	Prefix INT
    	,Area VARCHAR(20)
    	)
    GO
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'1234'
    	,'London'
    	)
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'123410'
    	,'Manchester'
    	)
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'123420'
    	,'York'
    	)
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'123430'
    	,'Cardiff'
    	)
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'123440'
    	,'Bath'
    	)
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'123450'
    	,'Derby'
    	)
    GO
    
    ;
    
    WITH cte
    AS (
    	SELECT C.*
    		,P.*
    		,ROW_NUMBER() OVER (
    			PARTITION BY C.CustomerID ORDER BY len(cast(P.Prefix AS VARCHAR(20))) DESC
    			) AS Rn
    	FROM Customer C
    	INNER JOIN Area P ON CAST(C.CustomerID AS VARCHAR(20)) LIKE CAST(P.Prefix AS VARCHAR(20)) + '%'
    	)
    SELECT *
    FROM cte
    WHERE Rn = 1
    ORDER BY CustomerID;

    BTW, I would suggest to use varchar columns instead of integer to avoid conversions.


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    • Proposed as answer by Taherul673 Thursday, February 13, 2014 5:45 PM
    Thursday, February 13, 2014 5:43 PM
    Moderator
  • Hi Teherul673

    Thanks - these 2 tables reside in different DBs but on the same server.

    It throws out error  : The multi-part identifier "x.name" could not be bound.

    To access Different DB tables in same instance:

    DATABASENAME..TABLENAME

    Thursday, February 13, 2014 5:46 PM
  • Try

    USE tempdb
    
    CREATE TABLE Customer (
    	NAME VARCHAR(20)
    	,CustomerID INT
    	)
    GO
    
    -- Inserting Data into Table
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'John'
    	,'1234501111'
    	)
    
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'Mat'
    	,'1234109874'
    	)
    
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'Sam'
    	,'1234207485'
    	)
    
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'Jake'
    	,'1234205241'
    	)
    
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'Daniel'
    	,'1234309632'
    	)
    
    INSERT INTO Customer (
    	NAME
    	,CustomerID
    	)
    VALUES (
    	'Mark'
    	,'1234908974'
    	)
    GO
    
    CREATE TABLE Area (
    	Prefix INT
    	,Area VARCHAR(20)
    	)
    GO
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'1234'
    	,'London'
    	)
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'123410'
    	,'Manchester'
    	)
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'123420'
    	,'York'
    	)
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'123430'
    	,'Cardiff'
    	)
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'123440'
    	,'Bath'
    	)
    
    INSERT INTO Area (
    	Prefix
    	,Area
    	)
    VALUES (
    	'123450'
    	,'Derby'
    	)
    GO
    
    ;
    
    WITH cte
    AS (
    	SELECT C.*
    		,P.*
    		,ROW_NUMBER() OVER (
    			PARTITION BY C.CustomerID ORDER BY len(cast(P.Prefix AS VARCHAR(20))) DESC
    			) AS Rn
    	FROM Customer C
    	INNER JOIN Area P ON CAST(C.CustomerID AS VARCHAR(20)) LIKE CAST(P.Prefix AS VARCHAR(20)) + '%'
    	)
    SELECT *
    FROM cte
    WHERE Rn = 1
    ORDER BY CustomerID;

    BTW, I would suggest to use varchar columns instead of integer to avoid conversions.


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    This is better solution so far!
    Thursday, February 13, 2014 5:47 PM
  • Hi sqlsaga

    Thanks its not complete - it fine for half the records the rest it pop up NULL.

    for example

    Customer ID      Area

    50230044413    Comcel
    50230902808    NULL

    but on the Area table I have like shown below.

    So I expect for customer ID : 50230902808   should pick the least best match : 50230.

    Thanks a lot in advance.

    Thursday, February 13, 2014 5:50 PM
  • Hi Taherul673

    Im using SQL Server version 2000 and it does not support the ROW_NUMBER function :(

    Thursday, February 13, 2014 5:59 PM
  • I didn't read the other replies but how far can you go to check the best match... like how many characters deep? till the first? I just considered till 4th... try this...

    SELECT a.Name, a.CustomerID, COALESCE(b.Area, c.Area, d.Area) AS Area
    FROM Customer a
    LEFT JOIN Area b ON CONVERT(VARCHAR(6), b.Prefix) = LEFT(CONVERT(VARCHAR(10),a.CustomerID),6) 
    LEFT JOIN Area c ON CONVERT(VARCHAR(6), c.Prefix) = LEFT(CONVERT(VARCHAR(10),a.CustomerID),5) AND b.AREA IS NULL
    LEFT JOIN Area d ON CONVERT(VARCHAR(6), d.Prefix) = LEFT(CONVERT(VARCHAR(10),a.CustomerID),4) AND b.AREA IS NULL

    Please mark as answer, if this has helped you solve the issue.

    Good Luck :)... visit www.sqlsaga.com for more t-sql code snippets and BI related how to articles.

    Thursday, February 13, 2014 6:06 PM
  • SQL Server 2000 is not supported by MS, I believe.

    In any case, you may re-write this query this way:

    SELECT C.* ,P.* FROM Customer C INNER JOIN Area P ON CAST(C.CustomerID AS VARCHAR(20)) LIKE CAST(P.Prefix AS VARCHAR(20)) + '%' WHERE LEN(cast(P.Prefix as varchar(20))) = (select max(LEN(cast(A.Prefix as varchar(20)))) from Area A

    where exists (select 1 from Customer C1

    where C1.CustomerID = C.CustomerID AND CAST(C1.CustomerID AS VARCHAR(20))

    LIKE CAST(A.Prefix AS VARCHAR(20)) + '%'));



    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    • Proposed as answer by Gert-Jan Strik Thursday, February 13, 2014 6:49 PM
    • Marked as answer by xXShanXx Friday, February 14, 2014 2:52 PM
    Thursday, February 13, 2014 6:24 PM
    Moderator
  • Thanks Naomi

    I need to run this on a million record temp table which takes big time to map the prefix around 20 min.

    Is there any other way to make this quicker. I have indexed the Prefix and CustomerID on both the tables.

    Tuesday, February 18, 2014 12:38 PM
  • Do you have to run it on SQL 2000 instance? I can not find a better way in SQL 2000 and this version is outdated. I suggest to upgrade your SQL Server and use the other version I posted.

    In SQL 2000 try outputting the first select statement (without where condition) into a temp table first and then select the best match.


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    My TechNet articles

    Tuesday, February 18, 2014 1:46 PM
    Moderator