Using CTEs twice in query produces wrong results
-
2012年4月16日 下午 01:39
I have been working on this for days and I ended up going with the solution in part one but this has really bothering me. I am new to CTEs but I see power in the tool. Here is my query
With cteDist (
[dstmID]
,[dstmDateAdd]
,[dstmUserAdd]
,[dstmDateMod]
,[dstmUserMod]
,[dstmClmID]
,[dstmCheckNum]
,[dstmCheckAmt]
,[dstmCheckDate]
,[dstmCheckMailDate]
,[dstmDistNum]
,[dstmOriginalCheck]
,[dstmAcctNum]
)
aS
(
SELECT [dstmID]
,[dstmDateAdd]
,[dstmUserAdd]
,[dstmDateMod]
,[dstmUserMod]
,[dstmClmID]
,[dstmCheckNum]
,[dstmCheckAmt]
,[dstmCheckDate]
,[dstmCheckMailDate]
,[dstmDistNum]
,[dstmOriginalCheck]
,[dstmAcctNum]
FROM [FCCProdRep].[dbo].[Distribution]
where DateDiff(d,DstmCheckDate,Getdate())>120--There are 7884678 records in this result set
)
, cteDistAct (
[dsamID]
,[dsamDateAdd]
,[dsamUserAdd]
,[dsamDateMod]
,[dsamUserMod]
,[dsamDstmID]
,[dsamDsacID]
,[dsamAmt1]
,[dsamDate1]
,[dsamNotes1]
)
as
(
SELECT [dsamID]
,[dsamDateAdd]
,[dsamUserAdd]
,[dsamDateMod]
,[dsamUserMod]
,[dsamDstmID]
,[dsamDsacID]
,[dsamAmt1]
,[dsamDate1]
,[dsamNotes1]
FROM [FCCProdRep].[dbo].[DistActionMain] dMC
join cteDist e
on
e.dstmID = dmC.dsamDstmID
)
select d.[dstmClmID] ClaimID
,d.[dstmCheckNum] CheckNum
,d.[dstmCheckAmt] Amt
,d.[dstmCheckDate] CheckDate
,d.[dstmCheckMailDate] MailDate
,d.[dstmDistNum] Dist
,d.[dstmOriginalCheck] Original
,d.dstmAcctNum
,DateDiff(d,d.DstmCheckDate,Getdate()) DaysOver
,Case
when dsamCleared= 1 Then 'Cleared'
when dsamVoid= 2 Then 'Voided'
when dsamstop= 3 Then 'Stopped'
Else
'Outstanding' End Status
from cteDist d
join BankAccounts b on d.dstmAcctNum = b.ID
join (select dstmClmID, max(dstmDistNum) dstmDistNum from DISTRIBUTION group by dstmClmID) dx
on d.dstmClmID = dx.dstmClmID and d.dstmDistNum = dx.dstmDistNum--This step identifies the most recent distribution and when joined to the cte gives 7814929 records
left join (select dsamDstmID, dsamDsacID dsamCleared from distActionMain where dsamDsacID = 1) dmC on d.dstmID = dmC.dsamDstmID
left join (select dsamDstmID, dsamDsacID dsamVoid from distActionMain where dsamDsacID = 2) dmV on d.dstmID = dmV.dsamDstmID
left join (select dsamDstmID, dsamDsacID dsamStop from distActionMain where dsamDsacID = 3) dmS on d.dstmID = dmS.dsamDstmID
--where DateDiff(d,d.DstmCheckDate,Getdate())>120---This works and gives the accruate number of records 7814929 and takes about 2:53 to runHowever If I use ONLY CTEs and not the Distribution table in the derived table section ie
The code above becomes
select d.[dstmClmID] ClaimID
,d.[dstmCheckNum] CheckNum
,d.[dstmCheckAmt] Amt
,d.[dstmCheckDate] CheckDate
,d.[dstmCheckMailDate] MailDate
,d.[dstmDistNum] Dist
,d.[dstmOriginalCheck] Original
,d.dstmAcctNum
,DateDiff(d,d.DstmCheckDate,Getdate()) DaysOver
,Case
when dsamCleared= 1 Then 'Cleared'
when dsamVoid= 2 Then 'Voided'
when dsamstop= 3 Then 'Stopped'
Else
'Outstanding' End Status
from cteDist d
join BankAccounts b on d.dstmAcctNum = b.ID
join (select dstmClmID, max(dstmDistNum) dstmDistNum from cteDist group by dstmClmID) dx
on d.dstmClmID = dx.dstmClmID and d.dstmDistNum = dx.dstmDistNum--This step identifies the most recent distribution and when joined to the cte NOW gives 7884678 records
left join (select dsamDstmID, dsamDsacID dsamCleared from distActionMain where dsamDsacID = 1) dmC on d.dstmID = dmC.dsamDstmID
left join (select dsamDstmID, dsamDsacID dsamVoid from distActionMain where dsamDsacID = 2) dmV on d.dstmID = dmV.dsamDstmID
left join (select dsamDstmID, dsamDsacID dsamStop from distActionMain where dsamDsacID = 3) dmS on d.dstmID = dmS.dsamDstmID
--where DateDiff(d,d.DstmCheckDate,Getdate())>120---This Does not work because of the derived table and gives the inaccruate number of records 7884678 and takes about less than a minute to run.I think this is a CTE issue and not a data issue but so I am not providing any data. Here are the tables structures if need be:
CREATE TABLE [dbo].[Distribution](
[dstmID] [int] IDENTITY(1,1) NOT NULL,
[dstmDateAdd] [datetime] NULL,
[dstmUserAdd] [varchar](50) NULL,
[dstmDateMod] [datetime] NULL,
[dstmUserMod] [varchar](50) NULL,
[dstmClmID] [int] NULL,
[dstmCheckNum] [int] NULL,
[dstmCheckAmt] [money] NULL,
[dstmCheckDate] [datetime] NULL,
[dstmCheckMailDate] [datetime] NULL,
[dstmDistNum] [int] NULL,
[dstmOriginalCheck] [int] NULL,
[dstmAcctNum] [int] NULL,
CONSTRAINT [PK_DistributionMain] PRIMARY KEY CLUSTERED
(
[dstmID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]----------
CREATE TABLE [dbo].[DistActionMain](
[dsamID] [int] IDENTITY(1,1) NOT NULL,
[dsamDateAdd] [datetime] NULL,
[dsamUserAdd] [varchar](50) NULL,
[dsamDateMod] [datetime] NULL,
[dsamUserMod] [varchar](50) NULL,
[dsamDstmID] [int] NULL,
[dsamDsacID] [int] NULL,
[dsamAmt1] [money] NULL,
[dsamDate1] [datetime] NULL,
[dsamNotes1] [varchar](2000) NULL,
CONSTRAINT [PK_DistActionMain] PRIMARY KEY CLUSTERED
(
[dsamID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]-------
CREATE TABLE [dbo].[BankDownload](
[recon] [nchar](2) NULL,
[acct] [nchar](12) NULL,
[fill] [nchar](5) NULL,
[rec] [nchar](1) NULL,
[amount] [nchar](10) NULL,
[serialnum] [nchar](10) NULL,
[date] [nchar](8) NULL,
[fill2] [nchar](2) NULL,
[clmnum] [varchar](32) NULL,
[bkdDateAdd] [datetime] NULL,
[ID] [int] IDENTITY(1,1) NOT NULL
) ON [PRIMARY]Let me know where the issue lies
所有回覆
-
2012年4月16日 下午 01:57
The problem is that you have overseen a further predicate you're using:
JOIN ( SELECT dstmClmID , MAX(dstmDistNum) dstmDistNum FROM DISTRIBUTION GROUP BY dstmClmID ) dx ON d.dstmClmID = dx.dstmClmID AND d.dstmDistNum = dx.dstmDistNum
The MAX(dstmDistNum) value may be different when using the CTE as it is a filtered set. So the JOIN condition may be more restrictive when using the CTE. btw, in your case you should test also using a temporary table to store the values of cteDist. A (temporary) table may perform better in such cases and your DATEDIFF() predicate is not sargeable, if an index on DstmCheckDate exists.
- 已提議為解答 Naomi NMicrosoft Community Contributor, Moderator 2012年4月16日 下午 04:15
- 已標示為解答 Garrett_II 2012年4月16日 下午 04:19
-
2012年4月16日 下午 04:10版主
Little known: CTE is like a macro, evaluated multiple times for multiple placements, so the results may vary.
CTE articles: http://www.bing.com/search?q=cte+site%3A+sqlusa.com
SARGable predicate article: http://www.sqlusa.com/bestpractices/sargable/
Kalman Toth SQL SERVER & BI TRAINING
- 已編輯 Kalman TothMicrosoft Community Contributor, Moderator 2012年4月16日 下午 04:12

