none
Query is executing forever .... RRS feed

  • Question

  • Hi All,

    Need some help in query tuning. There is a query which is provided by app team. They says it gets timed out after 2 hours as they set query timeout = 2 hours. Now they reached out for help from DBA team.

    We tried to run the query in Plan explorer to get the actual execution plan. The query is running over 9 hours but still doesn't finishes its execution. There is blocking involved. We have updated statistics with full scan. Still it runs forever ..... don't really know whats happening.

    Status of the spid always showing "runnable". what does it mean?

    When I open up the Windows task mgr and see the cpu utilization, it is 40-55% and not more than that which is normal.

    From sql server configuration side, they are 12 logical cpu on the prod server but MAXDOP is set to 1. just wanted to mention this. Not sure how much it is related here in this context.We are running SQL Server 2012 sp4 Enterprise Edition.

    I really don't understand whats the problem with the query can any help here.


    dropbox link

    https://www.dropbox.com/s/5ky0761iv6pmf36/QueryTuning.zip?dl=0

    Collected below information so far.

    - Query executing for more than 9 hours and still NOT YET finished !! - "Executing forever.PNG"
    - sp_whoisactive output - "sp_WIA.PNG"
    - Table and Index defintions - ddl.sql
    - Query text - "src_qry.txt"
    - Estimated Execution Plan ( collected using SQL Sentry Plan Explorer) -"EstimatedPlan.pesession"
    - Table sizes and index fragmentation details "Tablesizes n Index Fragmentation details.PNG"
    - When was stats got updated for those tables - "statistics update date.PNG"
    - sp_configue settings -"sp_configure settings.PNG"
    - cpu info - "cpu_info.PNG"
    - memory info - "memory_info.PNG"

    Thanks,

    Sam

    Saturday, December 7, 2019 12:31 PM

Answers

  • With the table sizes you have given, I certainly do expect this to run for nine hours and still counting. That is quite a, well, feat.

    It is always difficult to say things for sure from an estimated query plan alone. Had you been on SQL 2014 or later, you could have used the Live Query Plan to get actual values so far, which could have indicated where the estimates are off. Then again, I spotted one thing that I don't like and that is as Joe points out the Nested Loops operator just to the right of the SELECT. There is a CI scan of DM_NOTF_CHG_ATTR on the inner part of it. If the estimate for the Loop Join is correct, this is not a problem, because the estimate is 1. But I suspect that the actual count is a lot higher, and thus the CI scan is repeated again, again, again and again.

    It was great that you posted the DDL, but there are no CREATE INDEX statements in the file. Is this because you did not include any, or because there are no other indexes beyond the primary keys?

    From the query text, I would be inclined to add these indexes

    CREATE INDEX job_id_ix ON DM_NOTF_BTCH_DTL (NOTF_PRCS_JOB_ID, NOTF_BTCH_DTL_ID)
    
    CREATE INDEX chd_id_ix ON DM_NOTF_CHG_EVNT_ATTR(NOTF_CHG_EVNT_ID) 
       INLUCDE(CHG_TYP_ID, ELMNT_XPATH) 

    I am not going promise that the query will fly after this, as are many things I don't know about, for instance the data distribution. One thing I'm a little worried over is that all join conditions are not over keys, so what if there are multiple hits that results in a row explosion? A query that produces millions and millions of rows will never run fast. Then again the table DM_LKP_LANDING_TABLE_ATTR where the the most suspicious join condition appears only has 47 rows, so I don't want to make too much alarm about this.

    By the way, here is a reformatted version of the query, which at least I find a little easier to read. It was quite a mechanical transformation, so I believe the query is the same:

    SELECT CHG.ELMNT_NAME              ELMNT_NAME, 
           CHG.NOTF_PRCS_JOB_ID        NOTF_PRCS_JOB_ID, 
           SMRY.DUNS_NBR               DUNS_NBR, 
           LNDG_ATTR.DB_COL_NAME       DB_COL_NAME, 
           LNDG_ATTR.PARALLEL_COL_TYPE PARALLEL_COL_TYPE, 
           LNDG_ATTR.COL_SEQ_NO        COL_SEQ_NO, 
           CHG.CHANGE_FLAG             CHANGE_FLAG, 
           CHANGE_DETECTION_TIMESTAMP 
    FROM  (SELECT REG_ID, ELMNT_NAME, ELMNT_XPATH, NOTF_PRCS_JOB_ID, 
                  MAX(CHANGE_DETECTION_TIMESTAMP) CHANGE_DETECTION_TIMESTAMP, 
                  MAX(CHG_FLG) CHANGE_FLAG 
            FROM  (SELECT EVNT.REG_ID               REG_ID, 
                          EVNT.ELMNT_NAME           ELMNT_NAME, 
                          ATTR.ELMNT_XPATH          ELMNT_XPATH, 
                          BTCH_DTL.NOTF_PRCS_JOB_ID NOTF_PRCS_JOB_ID, 
                          EVNT.CHG_DETCT_TIME       CHANGE_DETECTION_TIMESTAMP, 
                          ( CASE WHEN LKP.ATTR_VAL IN('ValueChanged','NewlyAdded','Dropped') THEN 'Y' 
                                 WHEN LKP.ATTR_VAL = 'NoChange' THEN 'N' END) CHG_FLG 
                   FROM DM_NOTF_CHG_EVNT      EVNT, 
                   JOIN DM_NOTF_CHG_EVNT_ATTR ATTR ON EVNT.NOTF_CHG_EVNT_ID = ATTR.NOTF_CHG_EVNT_ID
                   JOIN DM_LKP_ATTR           LKP ON LKP.ATTR_ID = ATTR.CHG_TYP_ID
                   JOIN DM_NOTF_BTCH_DTL      BTCH_DTL ON EVNT.NOTF_BTCH_DTL_ID = BTCH_DTL.NOTF_BTCH_DTL_ID 
              WHERE BTCH_DTL.NOTF_PRCS_JOB_ID = 167 ) ATTR_FLAG 
              GROUP BY REG_ID, ELMNT_NAME, ELMNT_XPATH, NOTF_PRCS_JOB_ID) CHG, 
    JOIN    DM_REG_SMRY               SMRY       ON CHG.REG_ID = SMRY.REG_ID 
    JOIN    DM_MNTR_PROFILE           PROFILE,   ON SMRY.MP_ID = PROFILE.MP_ID 
    JOIN    DM_LKP_LANDING_TABLE_ATTR LNDG_ATTR  ON LNDG_ATTR.PRODUCT_ID = PROFILE.MP_DNB_PRODUCT_ID 
                                                AND CHG.ELMNT_XPATH = LNDG_ATTR.XPATH_VALUES
    


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    • Marked as answer by Samantha v Monday, December 9, 2019 8:57 AM
    Saturday, December 7, 2019 7:44 PM

All replies

  • I am not in front of a proper development environment right now,

    still, could you provide the estimated execution plan as a .sqlplan file? (from SSMS)

    look at the plan (Display Estimated Execution Plan):

    are there any warnings? red or yellow? as in missing join predicate

    are there any Nested Loops Join in which the access to the lower left (inner source) is a scan? this would be bad

    powering through few joins on a 6GB table with 2.7M should not be difficult,

    however, it should not be done at DOP 1,

    stick this at the end of the query OPTION(MAXDOP 4), then look at the plan again,

    do you see a Bitmap operator?


    jchang

    nevermind, just downloaded Plan Explorer, I see the loop joins

    in any case, please generate the plan at MAXDOP = 4, see if the loop join goes away

    try to execute if you like



    • Edited by jchang61 Saturday, December 7, 2019 6:59 PM
    Saturday, December 7, 2019 6:49 PM
  • it is the Nested Loops join to the right of the SELECT (top left) that bothers me.

    I do not know why this is the plan, it seems pretty stupid to me.

    for test, take the inner query, SELECT INTO a temp table, then execute the remaining query (I think it is this, but check as I maybe going senile)

     SELECT REG_ID, ELMNT_NAME, ELMNT_XPATH, NOTF_PRCS_JOB_ID,
     MAX(CHANGE_DETECTION_TIMESTAMP) CHANGE_DETECTION_TIMESTAMP,
     MAX(CHG_FLG) CHANGE_FLAG
     INTO #CHG
     FROM (
      SELECT EVNT.REG_ID REG_ID,
      EVNT.ELMNT_NAME ELMNT_NAME,
      ATTR.ELMNT_XPATH ELMNT_XPATH,
      BTCH_DTL.NOTF_PRCS_JOB_ID NOTF_PRCS_JOB_ID,
      EVNT.CHG_DETCT_TIME CHANGE_DETECTION_TIMESTAMP,
      ( CASE WHEN LKP.ATTR_VAL IN('ValueChanged','NewlyAdded','Dropped')
      THEN 'Y' WHEN LKP.ATTR_VAL = 'NoChange' THEN 'N' END) CHG_FLG
      FROM DM_NOTF_CHG_EVNT EVNT, DM_NOTF_CHG_EVNT_ATTR ATTR, DM_LKP_ATTR LKP
      , DM_NOTF_BTCH_DTL BTCH_DTL
      WHERE EVNT.NOTF_CHG_EVNT_ID = ATTR.NOTF_CHG_EVNT_ID AND LKP.ATTR_ID = ATTR.CHG_TYP_ID
      AND EVNT.NOTF_BTCH_DTL_ID = BTCH_DTL.NOTF_BTCH_DTL_ID AND BTCH_DTL.NOTF_PRCS_JOB_ID = 167
     ) ATTR_FLAG
     GROUP BY REG_ID, ELMNT_NAME, ELMNT_XPATH, NOTF_PRCS_JOB_ID
     OPTION(MAXDOP = 4)
    SELECT
    CHG.ELMNT_NAME ELMNT_NAME,
    CHG.NOTF_PRCS_JOB_ID NOTF_PRCS_JOB_ID,
    SMRY.DUNS_NBR DUNS_NBR,
    LNDG_ATTR.DB_COL_NAME DB_COL_NAME,
    LNDG_ATTR.PARALLEL_COL_TYPE PARALLEL_COL_TYPE,
    LNDG_ATTR.COL_SEQ_NO COL_SEQ_NO,
    CHG.CHANGE_FLAG CHANGE_FLAG,
    CHANGE_DETECTION_TIMESTAMP
    FROM #CHG CHG
    , DM_REG_SMRY SMRY, DM_MNTR_PROFILE PROFILE, DM_LKP_LANDING_TABLE_ATTR LNDG_ATTR
    WHERE CHG.REG_ID = SMRY.REG_ID AND SMRY.MP_ID = PROFILE.MP_ID
    AND LNDG_ATTR.PRODUCT_ID = PROFILE.MP_DNB_PRODUCT_ID
    AND CHG.ELMNT_XPATH = LNDG_ATTR.XPATH_VALUES
    OPTION(MAXDOP = 4)


    jchang

    Saturday, December 7, 2019 7:18 PM
  • With the table sizes you have given, I certainly do expect this to run for nine hours and still counting. That is quite a, well, feat.

    It is always difficult to say things for sure from an estimated query plan alone. Had you been on SQL 2014 or later, you could have used the Live Query Plan to get actual values so far, which could have indicated where the estimates are off. Then again, I spotted one thing that I don't like and that is as Joe points out the Nested Loops operator just to the right of the SELECT. There is a CI scan of DM_NOTF_CHG_ATTR on the inner part of it. If the estimate for the Loop Join is correct, this is not a problem, because the estimate is 1. But I suspect that the actual count is a lot higher, and thus the CI scan is repeated again, again, again and again.

    It was great that you posted the DDL, but there are no CREATE INDEX statements in the file. Is this because you did not include any, or because there are no other indexes beyond the primary keys?

    From the query text, I would be inclined to add these indexes

    CREATE INDEX job_id_ix ON DM_NOTF_BTCH_DTL (NOTF_PRCS_JOB_ID, NOTF_BTCH_DTL_ID)
    
    CREATE INDEX chd_id_ix ON DM_NOTF_CHG_EVNT_ATTR(NOTF_CHG_EVNT_ID) 
       INLUCDE(CHG_TYP_ID, ELMNT_XPATH) 

    I am not going promise that the query will fly after this, as are many things I don't know about, for instance the data distribution. One thing I'm a little worried over is that all join conditions are not over keys, so what if there are multiple hits that results in a row explosion? A query that produces millions and millions of rows will never run fast. Then again the table DM_LKP_LANDING_TABLE_ATTR where the the most suspicious join condition appears only has 47 rows, so I don't want to make too much alarm about this.

    By the way, here is a reformatted version of the query, which at least I find a little easier to read. It was quite a mechanical transformation, so I believe the query is the same:

    SELECT CHG.ELMNT_NAME              ELMNT_NAME, 
           CHG.NOTF_PRCS_JOB_ID        NOTF_PRCS_JOB_ID, 
           SMRY.DUNS_NBR               DUNS_NBR, 
           LNDG_ATTR.DB_COL_NAME       DB_COL_NAME, 
           LNDG_ATTR.PARALLEL_COL_TYPE PARALLEL_COL_TYPE, 
           LNDG_ATTR.COL_SEQ_NO        COL_SEQ_NO, 
           CHG.CHANGE_FLAG             CHANGE_FLAG, 
           CHANGE_DETECTION_TIMESTAMP 
    FROM  (SELECT REG_ID, ELMNT_NAME, ELMNT_XPATH, NOTF_PRCS_JOB_ID, 
                  MAX(CHANGE_DETECTION_TIMESTAMP) CHANGE_DETECTION_TIMESTAMP, 
                  MAX(CHG_FLG) CHANGE_FLAG 
            FROM  (SELECT EVNT.REG_ID               REG_ID, 
                          EVNT.ELMNT_NAME           ELMNT_NAME, 
                          ATTR.ELMNT_XPATH          ELMNT_XPATH, 
                          BTCH_DTL.NOTF_PRCS_JOB_ID NOTF_PRCS_JOB_ID, 
                          EVNT.CHG_DETCT_TIME       CHANGE_DETECTION_TIMESTAMP, 
                          ( CASE WHEN LKP.ATTR_VAL IN('ValueChanged','NewlyAdded','Dropped') THEN 'Y' 
                                 WHEN LKP.ATTR_VAL = 'NoChange' THEN 'N' END) CHG_FLG 
                   FROM DM_NOTF_CHG_EVNT      EVNT, 
                   JOIN DM_NOTF_CHG_EVNT_ATTR ATTR ON EVNT.NOTF_CHG_EVNT_ID = ATTR.NOTF_CHG_EVNT_ID
                   JOIN DM_LKP_ATTR           LKP ON LKP.ATTR_ID = ATTR.CHG_TYP_ID
                   JOIN DM_NOTF_BTCH_DTL      BTCH_DTL ON EVNT.NOTF_BTCH_DTL_ID = BTCH_DTL.NOTF_BTCH_DTL_ID 
              WHERE BTCH_DTL.NOTF_PRCS_JOB_ID = 167 ) ATTR_FLAG 
              GROUP BY REG_ID, ELMNT_NAME, ELMNT_XPATH, NOTF_PRCS_JOB_ID) CHG, 
    JOIN    DM_REG_SMRY               SMRY       ON CHG.REG_ID = SMRY.REG_ID 
    JOIN    DM_MNTR_PROFILE           PROFILE,   ON SMRY.MP_ID = PROFILE.MP_ID 
    JOIN    DM_LKP_LANDING_TABLE_ATTR LNDG_ATTR  ON LNDG_ATTR.PRODUCT_ID = PROFILE.MP_DNB_PRODUCT_ID 
                                                AND CHG.ELMNT_XPATH = LNDG_ATTR.XPATH_VALUES
    


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    • Marked as answer by Samantha v Monday, December 9, 2019 8:57 AM
    Saturday, December 7, 2019 7:44 PM
  • Hi Erland, there are no additional indexes on these tables.only clustered indexes as far as I know.

    Sunday, December 8, 2019 2:30 PM
  • Hi Erland, there are no additional indexes on these tables.only clustered indexes as far as I know.

    OK, then I think my suggested index can be helpful to you. Give them a spin!

    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Sunday, December 8, 2019 3:37 PM
  • Erland and Chang , Any reason why I am seeing status="runnable" all the time?Any theory behind it?

    Sunday, December 8, 2019 4:26 PM
  • The query plan you showed will take forever to run, hence it will be in runnable state forever. Kcura relativity generates sql that result in plans that take forever so I recognize these quickly

    jchang

    Sunday, December 8, 2019 5:02 PM
  • Erland and Chang , Any reason why I am seeing status="runnable" all the time?Any theory behind it?

    "Runnable" means that the process is able to execute. That is, it is not waiting for I/O, latches, network or anything else - with one exception. A runnable process could be waiting to get a CPU to execute on, but since you said that CPU is only 40-40% busy this is not happening here.

    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Sunday, December 8, 2019 6:30 PM
  • Erland and Chang , Any reason why I am seeing status="runnable" all the time?Any theory behind it?

    Hi Sam,

    Runnable – The session with this status is actually have been assigned a thread but waiting for the CPU cycle to be available. Please refer to Different Status of a SPID in SQL Server and What do they mean to get more information.

    Best regards,
    Cathy 

    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to  MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Monday, December 9, 2019 8:48 AM