DQS-performance
-
lundi 19 décembre 2011 04:20
Hi,
The DQS-cleansing is taking time,is there any way to increse the performance of the activity.
s dhwani
Toutes les réponses
-
lundi 19 décembre 2011 05:37Modérateur
Few questions to better understand.
1. Are you on CTP3 or RC0 build of SQL Server 2012? What is the version number if in doubt?
RC0 was supposed to have enhancement to performance.
2. How many rows are in the source data that you are cleansing? Ballpark number is fine.
SQL table, or Excel file?
3. Did you do Data Discovery or import values into the Domains in the KB that you are using before doing the cleansing?
Are there any fancy domain rules or composite domain rules? Sometimes a complex rule could affect perf.
Do you have the domains pointed to any Azure Datamarket Reference Data Service providers? RDS conversations may add to the duration since its across the internet.
4. What are the general hardware specs - CPUs, RAM, Disks. Are the DQS databases on the C drive, or on a separate data drive? What about tempdb database files?
Thanks, Jason
Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation - http://support.microsoft.com/select/default.aspx?target=assistance -
lundi 19 décembre 2011 07:31
Hi Jason,
I am working with DQS-cleansing component for that it is taking 5-6 minutes to execute the package.The source component is having just 36 rows and using the KB which I have with 1 domain and 40 domain values for that domain.If it takes this much time to cleanse just 36 rows then this will be not a choice for millions of records right.
For the first time when I run the package it took nearly 43 minutes for the same scenario,now it takes 4-5 minutes.
1. well I am using the RC0.
2. I just have 36 rows in my source i,e the SQL table.
3. yeah I have done knowledge Discovery after creating the domain rules for my domains in the knowledge base,no fancy and complex rules for my domain.I am not using the RDS too.
4. The DQS databases are on the C drive.
hardware specifications:
3GB RAM.
Processor AMD athlon x2 245.
disk space 320 GB.
s dhwani -
lundi 19 décembre 2011 07:41
Hi jason,
I have one more thing to ask, suppose knowledge base is having 1 million values and every time cleansing is done ten thousand new values are added to the KB.
when cleansing is done the DQS has to compare the source values with all the domain values in our KB right ,the size of KB increases and there is increase in processing time as we have to compare the values with the domain values.
Is there any way to optimize the performance of DQS?
s dhwani -
mardi 20 décembre 2011 07:14Modérateur
Can you please compare running the same Cleansing activity interactively from DQS Client to SSIS Package just to see if its an SSIS only problem. I suspect is it just SSIS related somehow. I heard that SSIS performance is being tuned still for some problems, but I don't have the details yet. Will check on it further.
Just 36 rows, that doesn't sound right at all. I have seen it much faster than that.
Do you see any error in the SQL Errorlog (MSSQL\LOG\ERRORLOG file or exec sp_readerrorlog). Sometimes app domains get unloaded when memory is low that can cause failures.
Regarding your last point, its important to note that when you run a cleansing project, the KB doesn't improve automatically, so the recently cleansed values are not necessarily retained into the KB. There is a manual way to run add correct cleansed data into the KB as imported values, or run data discovery, to improve the KB.
This below information might help us narrow down which event is slow... if you can collect one of the three, or all three and attach, that will help use understand where the sluggishness is happening.
A. Flow Audit trail will show the duration of each flow (internal action) inside of the Cleansing. {
SELECT DATEDIFF(MILLISECOND, F.START_TIME, FA.END_TIME) AS DurationMilliSec,
F.START_TIME, FA.END_TIME, KB.NAME as KB_NAME,
F.*, FA.*
FROM DQS_MAIN.dbo.A_FLOW F WITH (NOLOCK)
LEFT OUTER JOIN DQS_MAIN.dbo.A_FLOW_ANSWER FA WITH (NOLOCK)
ON F.ID=FA.FLOW_ID
LEFT OUTER JOIN DQS_MAIN.dbo.A_KNOWLEDGEBASE KB WITH (NOLOCK)
ON KB.ID=F.KNOWLEDGEBASE_ID
ORDER BY DurationMilliSec DESC;
}Pivot by Process (an activity like running the SSIS Project)
{
DECLARE @Process_UID varchar(255)
DECLARE processes CURSOR
FOR SELECT UNIQUE_ID FROM DQS_MAIN.dbo.A_PROCESS ORDER BY CREATION_TIMEOPEN processes
FETCH NEXT FROM processes INTO @Process_UIDWHILE @@FETCH_STATUS=0
BEGIN
SELECT * FROM DQS_MAIN.dbo.A_PROCESS WHERE UNIQUE_ID=@Process_UID
SELECT * from DQS_MAIN.dbo.V_A_FLOW where UNIQUE_ID=@Process_UID ORDER BY FLOW_ID
FETCH NEXT FROM processes INTO @Process_UID
END
CLOSE processes
DEALLOCATE processes
}B. Check DMV output for any obvious slow DQS queries (results only show for things run since SQL was last restarted.
{
SELECT
LEFT (p.cacheobjtype + ' (' + p.objtype + ')', 35) AS cacheobjtype,
p.usecounts, p.size_in_bytes / 1024 AS size_in_kb,
PlanStats.total_worker_time/1000 AS tot_cpu_ms, PlanStats.total_elapsed_time/1000 AS tot_duration_ms,
PlanStats.total_physical_reads, PlanStats.total_logical_writes, PlanStats.total_logical_reads,
PlanStats.CpuRank, PlanStats.PhysicalReadsRank, PlanStats.DurationRank,
LEFT (CASE
WHEN pa.value=32767 THEN 'ResourceDb'
ELSE ISNULL (DB_NAME (CONVERT (sysname, pa.value)), CONVERT (sysname,pa.value))
END, 40) AS dbname,
sql.objectid,
CONVERT (nvarchar(50), CASE
WHEN sql.objectid IS NULL THEN NULL
ELSE REPLACE (REPLACE (sql.[text],CHAR(13), ' '), CHAR(10), ' ')
END) AS procname,
REPLACE (REPLACE (SUBSTRING (sql.[text], PlanStats.statement_start_offset/2 + 1,
CASE WHEN PlanStats.statement_end_offset = -1 THEN LEN (CONVERT(nvarchar(max), sql.[text]))
ELSE PlanStats.statement_end_offset/2 - PlanStts.statement_start_offset/2 + 1
END), CHAR(13), ' '), CHAR(10), ' ') AS stmt_text
FROM
(
SELECT
stat.plan_handle, statement_start_offset, statement_end_offset,
stat.total_worker_time, stat.total_elapsed_time, stat.total_physical_reads,
stat.total_logical_writes, stat.total_logical_reads,
ROW_NUMBER() OVER (ORDER BY stat.total_worker_time DESC) AS CpuRank,
ROW_NUMBER() OVER (ORDER BY stat.total_physical_reads DESC) AS PhysicalReadsRank,
ROW_NUMBER() OVER (ORDER BY stat.total_elapsed_time DESC) AS DurationRank
FROM sys.dm_exec_query_stats stat
) AS PlanStats
INNER JOIN sys.dm_exec_cached_plans p ON p.plan_handle = PlanStats.plan_handle
OUTER APPLY sys.dm_exec_plan_attributes (p.plan_handle) pa
OUTER APPLY sys.dm_exec_sql_text (p.plan_handle) AS sql
WHERE (PlanStats.CpuRank < 50 OR PlanStats.PhysicalReadsRank < 50 OR PlanStats.DurationRank < 50)
AND pa.attribute = 'dbid'
-- filter the result to only DB's for DQS
AND pa.value IN (SELECT database_id FROM Master.SYS.databases where name like 'DQS%')
ORDER BY tot_cpu_ms DESC}
C. There is an SSIS log that can be produced by checking the folder %appdata%/SSDQS/<guid>DQSSISLog.log That log will help show where the time is passing if you can configure it and attach it.
{
1. Locate the following two files. Depending on which bitness is used to run or debug the package, the corresponding .xml configuration file is selected.
SSIS is usually installed to the C drive, but could have been relocated.
C:\Program Files (x86)\Microsoft SQL Server\110\DTS\Binn\DQLog.SSIS.xml
C:\Program Files\Microsoft SQL Server\110\DTS\Binn\DQLog.SSIS.xml3. Make a safe copy of the above files as the originals.
4. Open notepad or similar text editor, explicitly Run as Administrator (UAC elevation required)
5. Control+H to find and replace the Information value with the All value:
Find what: <add switchValue="Information"
Replace with: <add switchValue="All"6. Save the edit to the .xml file
7. Repeat the trial by running the SSIS package
8. Review the log output to see where the majority of the time was spent.
}
Thanks, Jason
Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation - http://support.microsoft.com/select/default.aspx?target=assistance -
vendredi 23 décembre 2011 00:44
I've raised a connect relating to this: https://connect.microsoft.com/SQLServer/feedback/details/713837/data-quality-services-dqs-cleansing-component-performance-too-slow
Based on my analysis here: http://www.bimonkey.com/2011/12/sql-server-data-quality-services-in-sql2012-rc0-part-1/
Essentially even for for a simple set of rules, DQS / SSIS do not perform at a level that would allow production use in a DW scale system.
Cheers, James
James Beresford @ www.bimonkey.com & @BI_Monkey
SSIS / MSBI Consultant in Sydney, Australia
SSIS ETL Execution Control and Management Framework @ SSIS ETL Framework on Codeplex -
vendredi 23 décembre 2011 08:14Modérateur
@James - But the order of magnitude is different here. At the best rate of 5-6 minutes for 36 rows, 1 domain. Something else is going on. The first time 43 minutes is very high.
@Dhwani- do queries I posted help isolate where the time is spent?
Thanks, Jason
Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation - http://support.microsoft.com/select/default.aspx?target=assistance -
vendredi 23 décembre 2011 09:23
@Jason, yes, scale is different but the problem remains - SSIS & DQS do not play together fast enough. Was hoping you could provide some insight.
Cheers, James
James Beresford @ www.bimonkey.com & @BI_Monkey
SSIS / MSBI Consultant in Sydney, Australia
SSIS ETL Execution Control and Management Framework @ SSIS ETL Framework on Codeplex -
jeudi 12 avril 2012 21:13Modérateur
Just cross posting this information on related perf threads on the forum...
Cumulative Update 1 for SQL Server 2012 RTM has some pretty big performance tuning efforts for DQS. For DQS users concerned about performance, please give it a try and see if it helps your workloads.
FIX: Slow performance when you use DQS to perform a data cleansing against a large data source in SQL Server 2012
http://support.microsoft.com/kb/2669955Download the CU1 package at the top of this article: http://support.microsoft.com/kb/2679368
Don't forget you have to run DQSInstaller.exe -Upgrade after each major CU to patch the DQS schema. Directions for Upgrade here
http://social.technet.microsoft.com/wiki/contents/articles/8442.upgrade-dqs-installing-cumulative-updates-or-hotfix-patches-on-data-quality-services-en-us.aspxThanks, Jason
Didn't get enough help here? Submit a case with the Microsoft Customer Support team for deeper investigation - http://support.microsoft.com/select/default.aspx?target=assistance
- Proposé comme réponse Koen VerbeeckMicrosoft Community Contributor vendredi 13 avril 2012 06:18

