Jawab Microsft SCD task issues

  • 09 Maret 2012 13:22
     
     

    Hi

    I am using Microsft SCD task to load a patient table iwth 500000 rows of distinct patient data i used the primary key of that table as business key and the

    the patientID,NationalID,DistrictNum as Fixed attirbutes and rest are as changing and address as historical the problem is the SCD task taking longer time to insert data it takes around 65 mins to laod the table i used kimball method to load it was unable to laod it gave me system out of memeory exception as an error ,

    The Microsoft SCD was performing good earlier   it took just 6 min to laod the same data but not now,I am trying to find an alternate way of merging  both the source and dimension and use a conditional split and insert and update the table can any one plaese suggest me way to sort out this do i need to update my sql server ,

    any other way to sort out  this

    Thanks


    Sri.Tummala

Semua Balasan

  • 09 Maret 2012 14:20
     
     

    There is nothing too much difference on the performance by using Conditional Split + OLE DB Command or SCD. The performance of the SCD is really depends on most of the method to process the new records (update, insert or add). For example it used 6 min to add all new records, but it could spend 60 min to update the whole table.

    So the better way to identify the SSIS performance issue is to open the SQL profiler or System Monitor to watch the process and then decide your next step.

  • 09 Maret 2012 15:24
    Moderator
     
     

    An often used alternative for the SCD component is this (depending on the dimension type):

    Where you check only the businesskey at 1 and optionally check all other attributes at 2.


    Please mark the post as answered if it answers your question | My SSIS Blog: http://microsoft-ssis.blogspot.com | Twitter

  • 09 Maret 2012 15:56
     
     

    Hi I used Kimball Dimension merge SCD for this task it worked fine to insert data while I tried to run again it gave me the following errors

    Information: The buffer manager has allocated 130944 bytes, even though the memory pressure has been detected and repeated attempts to swap buffers have failed.

    Information: The buffer manager has allocated 130944 bytes, even though the memory pressure has been detected and repeated attempts to swap buffers have failed.

    Information: The buffer manager failed a memory allocation call for 130944 bytes, but was unable to swap out any buffers to relieve memory pressure. 873 buffers were considered and 775 were locked. Either not enough memory is available to the pipeline because not enough are installed, other processes were using it, or too many buffers are locked.

    any way to sort this out  please


    Sri.Tummala

  • 09 Maret 2012 16:59
    Moderator
     
     

    The Dimension Merge SCD component uses more memory than the standard SCD Wizard to trade off for performance.  Doesn't seem like you have enough to spare.

    But Patrick is correct - the slow part is probably not the SCD component itself, it's probably the OLE DB Command that's performing updates.  That's a row by row operation.  You should read Matt Masson's blog about improving the SCD data flow.


    Todd McDermid's Blog Talk to me now on

  • 09 Maret 2012 17:21
     
     

    Hi Todd,

    I used the Merge SCD kimball methos it inserted the data quickly the issues was when i tried to run again it processed 300000 rows and after that it throwing this error

    [Dimension Merge Slowly Changing Dimension] Error: Internal error (Exception of type 'System.OutOfMemoryException' was thrown.) in ProcessInput adding rows to the cache.

    [Dimension Merge Slowly Changing Dimension] Error: Internal error (Error building work units: Error getting next work unit - queueing up keys: Internal error in RuntimeInputCache retrieving current key list.) in ProcessCache_Thread_MatchKeys.

    Information: The buffer manager has allocated 130944 bytes, even though the memory pressure has been detected and repeated attempts to swap buffers have failed.

    if my system is out of memeory than what the wqay to process thsi package how should test ,I am running on local machine we dont have any testing server to test how should i test the updates or new inserts

    I need your suggestion Todd


    Sri.Tummala

  • 09 Maret 2012 17:53
     
     
    Ok, it seems to be still a memory issue. There is no too much space to adjust your design to fix your issue by using your current memory to do the SCD. General speaking, the Kimball SCD should be better than the build-in SCD transformation in the performance perspective. But in your case, it seems to be not a just performance issue.
    300k rows is not too big at all, if you want to save the money on RAM, you have to play with the SSIS in depth, for example you can consider to use Execute SQL task, or Lookup in the Full catch mode (use the catch file).
  • 09 Maret 2012 18:41
    Moderator
     
     

    If you want to use the DImension Merge component, the best way to make sure it uses the least amount of memory is to:

    • Pass each input sorted by the business key.  I suggest avoiding the Sort component and reading the data from the actual source in sorted order.
    • If you're managing surrogate keys with the component, and you're not using an identity column in SQL to manage surrogate keys, then make sure you supply the component with the next available surrogate key value via a variable.

    Otherwise, go read Matt's article.


    Todd McDermid's Blog Talk to me now on

  • 12 Maret 2012 8:48
     
     

    Hi Todd,

    As you mentioned above 2 point the first point was done I sorted out the data in the advanced editor, I am coudnt able to understand the second point

    • If you're managing surrogate keys with the component, and you're not using an identity column in SQL to manage surrogate keys, then make sure you supply the component with the next available surrogate key value via a variable.
    • ?
    • Can you please explain it
    • and where is Matt's Article

    Sri.Tummala

  • 12 Maret 2012 16:39
    Moderator
     
     Saran Jawaban

    Make sure the data is ACTUALLY SORTED.  Changing properties in the Advanced Editor DOES NOT sort the data.  It's a commitment by YOU to SSIS that the data is sorted.  Make sure your OLE DB Source (or wherever the data is coming from) specifies an ORDER BY or other actual sort command.

    As for point #2 - please tell me how the rows in your dimension table get their surrogate keys assigned.

    Matt's article is linked to in my first post.


    Todd McDermid's Blog Talk to me now on

  • 13 Maret 2012 12:25
     
     

    Hi Todd,

    Point 1) Yes its been sorted on PatinetID,AttDate on both the dimension and source its fine

    Point 2)Surrogate keys of dimension gets populated when the row gets inserted in to the table data type INT IDENTITY PRIMARY KEY CLUSTERED

    I am still unable to update or delete the Dimension ie.. I am unable to handle changes see I am still on development box this is not server I know it doenst have enough memeory to handle but it handled before so that sthe reason I am behind this package.

    what would be the reason ?


    Sri.Tummala

  • 13 Maret 2012 12:26
     
     

    Hi Todd,

    Presnetly I am truncating the dimension table and inserting it again

    I would like to update or delete the changes

    Thank You


    Sri.Tummala

  • 13 Maret 2012 15:32
    Moderator
     
     

    AAH.  Don't truncate the dimension table! (I hope you're only doing that because you haven't got this process running... not because that's a step you WANT to do with the DMSCD.)

    If you've got the inputs sorted, and you're not managing surrogate keys in the component (you're not, because you have identities)... then your options are:

    • Reduce the size of the data stream - remove any unnecessary columns, reduce widths of columns, etc...  Probably not something you can do with a dimension table load, but you have to try.
    • Architect the data flow such that the Source system data is pushed completely into the DMSCD first.  No, I can't tell you how to arrange that.  At best, it's complicated.
    • Get more memory.  Is this a 32-bit machine?

    I'd like to see a screenshot of your data flow, just to see if there are any other suggestions I could make.


    Todd McDermid's Blog Talk to me now on

  • 13 Maret 2012 16:33
     
     

    Hi Todd,

    Thanks for your reply My system is 64 bit ,I am truncating because I am unable to run the SCD,

    I need all the columns from patient which has 500000 rows ,I ran the same SCd with 200 paitents it was fine and these patients go for a look up with

    Case note that has 900000 rows and bring in the columns which they want i can go for no cache or partial cache but it runs for ages and another look up with GP which has 40000 rows and than it comes to Kimball SCD

    age group just adds an expression to it

    (AGE <= 16 ? "0-16" : AGE >= 76 ? "75 +" : "Over 16 Under 75")

     


    Sri.Tummala

  • 13 Maret 2012 22:48
    Moderator
     
     Jawab

    OK - this isn't ideal, but can you split this flow in two?  Take your PMI source, up to and including the GP1 Lookup and put them in another flow.  Send their rowset to a Raw File Destination.

    Use a Raw File Source to read in that "source" data.

    If that doesn't work, my friend, you'll just have to get more RAM for your Dev machine...


    Todd McDermid's Blog Talk to me now on

    • Ditandai sebagai Jawaban oleh kali786 14 Maret 2012 15:41
    •  
  • 14 Maret 2012 15:43
     
     

    Hi Todd,

    Even the raw file osurce didnt worked but improved a bit performance and after 300000 rows of data i got the same error system out of memory only way is

    If that doesn't work, my friend, you'll just have to get more RAM for your Dev machine...

    Thank you

    Todd



    Sri.Tummala