locked
How to find similar numeric or alphanumeric in millions of record RRS feed

  • Question

  •  

     

     I have a issue , i have a packet_no of 10 to 12 digits (Number may be numeric or alphanumeric ) ,    this   packet when reaches its destination then due to bad scan one or two digits of original  is changed . 

     So my issue is that, how I should find mismatched packet_no in millions of record. 

     My matching condition is date range and store_no where the packet is going. 

    Any help is highly appreciated 

    Thanks & regards

    Rohit 

    Thursday, June 13, 2013 6:51 AM

Answers

  • Try this:

    create table #t(id varchar(12))
    insert into #t values ('110040402811')
    insert into #t values ('110040402820')
    insert into #t values ('110040402841')
    insert into #t values ('110040402856')
    insert into #t values ('110040402857')
    insert into #t values ('110040402858')
    insert into #t values ('110040402859')
    insert into #t values ('110040402863')
    
    Declare @original varchar(12)
    Set @original = '110040402710'
    
    ;WITH C1 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,1,1) = SUBSTRING(@original,1,1)
    ), C2 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,2,1) = SUBSTRING(@original,2,1)
    ), C3 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,3,1) = SUBSTRING(@original,3,1)
    ), C4 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,4,1) = SUBSTRING(@original,4,1)
    ), C5 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,5,1) = SUBSTRING(@original,5,1)
    ), C6 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,6,1) = SUBSTRING(@original,6,1)
    ), C7 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,7,1) = SUBSTRING(@original,7,1)
    ), C8 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,8,1) = SUBSTRING(@original,8,1)
    ), C9 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,9,1) = SUBSTRING(@original,9,1)
    ), C10 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,10,1) = SUBSTRING(@original,10,1)
    ), C11 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,11,1) = SUBSTRING(@original,11,1)
    ), C12 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,12,1) = SUBSTRING(@original,12,1)
    ), combined AS (
      SELECT * FROM C1 UNION ALL
      SELECT * FROM C2 UNION ALL
      SELECT * FROM C3 UNION ALL
      SELECT * FROM C4 UNION ALL
      SELECT * FROM C5 UNION ALL
      SELECT * FROM C6 UNION ALL
      SELECT * FROM C7 UNION ALL
      SELECT * FROM C8 UNION ALL
      SELECT * FROM C9 UNION ALL
      SELECT * FROM C10 UNION ALL
      SELECT * FROM C11 UNION ALL
      SELECT * FROM C12
    ), at_least_10 AS (
      SELECT id, COUNT(*) AS cnt
      FROM combined
      GROUP BY id
      HAVING COUNT(*) >= 10
    )
    SELECT *
    FROM at_least_10
    
    drop table #t
    


    Gert-Jan

    • Proposed as answer by Allen Li - MSFT Saturday, June 22, 2013 8:49 AM
    • Marked as answer by Kalman Toth Saturday, June 29, 2013 6:58 AM
    Thursday, June 13, 2013 1:13 PM

All replies

  • Provide DDL, Sample Data and Expected Output.

    Regards, RSingh

    Thursday, June 13, 2013 6:57 AM
  • These are the certain packet going from A to B

    110040402811

    110040402820

    110040402841

    110040402856

    110040402857

    110040402858

    110040402859

    110040402863

    When they arrive at B, scanning is done some times due to bad scanning number get changed.

     

    As date and store# are same for a packet#. So I am finding changed number using

    Original number

     

    I am using like statement

    select @packet_no  = replace (@packet_no ,substring(@packet_no ,2 ,1) ,'%')

    select @packet_no  = replace (@packet_no ,substring(@packet_no ,5 ,1) ,'%')

    select @packet_no  = replace (@packet_no ,substring(@packet_no ,8 ,1) ,'%')

    and   finding the changed number , but it is not working fine.

     

    Thanks & regards

    Rohit 

    Thursday, June 13, 2013 7:32 AM
  • Hi,

    Check if the bultin function SOUNDEX() is useful to solve your issue.

    E.g. WHERE SOUNDEX(Source_packet_no) = SOUNDEX(Target_packet_no)

    Thanks,

    Asharaf


    MCTS, http://asharafaliptb.wordpress.com

    Thursday, June 13, 2013 7:38 AM
  • Try this:

    create table #t(id varchar(12))
    insert into #t values ('110040402811')
    insert into #t values ('110040402820')
    insert into #t values ('110040402841')
    insert into #t values ('110040402856')
    insert into #t values ('110040402857')
    insert into #t values ('110040402858')
    insert into #t values ('110040402859')
    insert into #t values ('110040402863')
    
    Declare @original varchar(12)
    Set @original = '110040402710'
    
    ;WITH C1 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,1,1) = SUBSTRING(@original,1,1)
    ), C2 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,2,1) = SUBSTRING(@original,2,1)
    ), C3 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,3,1) = SUBSTRING(@original,3,1)
    ), C4 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,4,1) = SUBSTRING(@original,4,1)
    ), C5 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,5,1) = SUBSTRING(@original,5,1)
    ), C6 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,6,1) = SUBSTRING(@original,6,1)
    ), C7 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,7,1) = SUBSTRING(@original,7,1)
    ), C8 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,8,1) = SUBSTRING(@original,8,1)
    ), C9 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,9,1) = SUBSTRING(@original,9,1)
    ), C10 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,10,1) = SUBSTRING(@original,10,1)
    ), C11 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,11,1) = SUBSTRING(@original,11,1)
    ), C12 AS (
      SELECT id
      FROM #t
      WHERE SUBSTRING(id,12,1) = SUBSTRING(@original,12,1)
    ), combined AS (
      SELECT * FROM C1 UNION ALL
      SELECT * FROM C2 UNION ALL
      SELECT * FROM C3 UNION ALL
      SELECT * FROM C4 UNION ALL
      SELECT * FROM C5 UNION ALL
      SELECT * FROM C6 UNION ALL
      SELECT * FROM C7 UNION ALL
      SELECT * FROM C8 UNION ALL
      SELECT * FROM C9 UNION ALL
      SELECT * FROM C10 UNION ALL
      SELECT * FROM C11 UNION ALL
      SELECT * FROM C12
    ), at_least_10 AS (
      SELECT id, COUNT(*) AS cnt
      FROM combined
      GROUP BY id
      HAVING COUNT(*) >= 10
    )
    SELECT *
    FROM at_least_10
    
    drop table #t
    


    Gert-Jan

    • Proposed as answer by Allen Li - MSFT Saturday, June 22, 2013 8:49 AM
    • Marked as answer by Kalman Toth Saturday, June 29, 2013 6:58 AM
    Thursday, June 13, 2013 1:13 PM
  • You need to send along a CHECKSUM value with each string:

    http://msdn.microsoft.com/en-us/library/ms189788.aspx

    At destination the CHECKSUM can be calculated again and compared.


    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012

    Saturday, June 22, 2013 8:46 AM
  • Hm, Kalman, how is your thought here? That they should change their bar codes (or how these packed numbers are printed), so that it includes a checksum/digit?

    I agree that this is a good idea, and maybe Rohit should investigate whether there are already check digits in place, because that could help him to reconstruct the correct number from failed scan, or at least limit the space of possible numbers.

    But if there are no check digits in place, this require a bigger change in the process than a mere programmer like Rohit may be able to enforce.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Saturday, June 22, 2013 10:12 AM
  • This sounds like a generic string match and error correction question.

    Which means there is not likely to be just a single answer based on the digits, since any one or more digits might be randomly changed.

    These are both highly specialized areas and not really a database issue.

    Are you sure there isn't some kind of error detection or correction built into the numbers already?  Because it sounds like there should be!

    Josh

    Saturday, June 22, 2013 5:12 PM