locked
extremely slow: WHERE not exists (select 1 from... RRS feed

  • Question

  • I upload daily a file we receive on .csv (first I upload it completely to MemberCity_w)

     

    Then the final table is MemberCity.

     

    My issue is that the below query takes several minutes to run due to the ‘where’ clause.

     

    I added that because we usually receive duplicate files and I don’t want to have duplicates in the case I insert the same file twice, is there any other way to achieve this?

     

     

    SELECT date,City,MemberID

           FROM MemberCity_w mw

           WHERE not exists (select 1 from [MemberCity] m where m.date=mw.date)

     ps: execution plan gives me cost 86% for the scan of the table... (I am using sql 2016)

    Friday, March 23, 2018 4:45 PM

All replies

  • Do you have the execution plan ?

    Does this bring any difference ?

       SELECT mw.date,mw.City,mw.MemberID
    	   FROM MemberCity_w mw
    	   left join [MemberCity] m 
    	   on m.date=mw.date
    	   where m.date is null

    is date is only column common between those two tables  to filter duplicates? 


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Friday, March 23, 2018 4:51 PM
  • Yes.

    The code you wrote requires SQL Server to check MemberCity for every row in MemberCity_w, row by row.

    The codes that Sarat wrote should look up MemberCity once.

    Friday, March 23, 2018 5:11 PM
  • Is there an index on MemeberCity.date?

    Keep in mind that the 86% you see is only the estimate - the real bottleneck in the query may be elsewhere.

    Friday, March 23, 2018 5:15 PM
  • If your files are incremental i.e. all records have a greater date than the date in last loaded file then you can use below:

    SELECT date,City,MemberID
    FROM MemberCity_w mw
    WHERE mw.date > (select max(date) from [MemberCity])

    If that's not true then try using below:

    SELECT date,City,MemberID
    FROM MemberCity_w mw
    WHERE mw.date NOT IN (select distinct date from [MemberCity])
    Also, create an index on date column of MemberCity table if its not already there.


    HTH,
    Cheers!!
    Ashish
    Please mark it as Answered if it answered your question or mark it as Helpful if it helped you solve your problem.

    Friday, March 23, 2018 5:15 PM
  • I upload daily a file we receive on .csv (first I upload it completely to MemberCity_w)

     

    Then the final table is MemberCity.

     

    My issue is that the below query takes several minutes to run due to the ‘where’ clause.

     

    I added that because we usually receive duplicate files and I don’t want to have duplicates in the case I insert the same file twice, is there any other way to achieve this?

     

     

    SELECT date,City,MemberID

           FROM MemberCity_w mw

           WHERE not exists (select 1 from [MemberCity] m where m.date=mw.date)

     ps: execution plan gives me cost 86% for the scan of the table... (I am using sql 2016)

    Clearly you are missing other columns in filter condition, I doubt date is the only column you want to compare

    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Friday, March 23, 2018 5:23 PM
  • Do you have an index on date or any index?
    Saturday, March 24, 2018 5:41 AM
  • Does the above given query gives you the required output?

    What about below query?

    select distinct date,City,MemberID from test

    How many records you have in the MemberCity_w table?

    Saturday, March 24, 2018 7:10 AM