locked
How to design (.NET Application & SQL Database) Fraud/Suspicious Activity Detection Application for Financial Organization RRS feed

  • Question

  • Hello Guys,

     

    I know this is basic and different people do that in different ways according to their needs; but I would like to know about the best practices to design a highly scalable real time enterprise databases/application for financial institutions.

     

    Here is a scenario. This application would  be import Customer’s Transactional as well as updated profile from Bank’s Core Banking Solution or Data warehouse daily in a batch mode.

    Now our application has to determine the suspicious activities by running some rule sets on the transactions. Let me let you some example of rule sets :

     

    1.       generate suspicious activity alert if  customer does debit/credit transactions amount sum exceeds 1 million dollars within a month.

    2.       Generate alert if Customer Debit transaction amount > X% of A (where A=maximum debit transaction amount over y1 period)

     

    This application should be able to find suspicious activity for at least 5 millions customers and 1 million transaction would be imported daily to our engine DB.

     

    I would appreciate if could suggest me how this application should be designed on (VS.2005,SQL 2005) platform. Should we have two DB ,one OLTP for keeping customer profile & his transactions and another OLAP DB where analysis services, business intelligence, data mining would store customer past history in fact tables so at the time of rule processing those result can be queried from here. Should we build rule engine in VS.net or in SQL 2005 analysis services or in Biz Talk server or web services or data mining.How analysis services, business intelligence, data mining of SQL/.NET can be used to achieve this application.

     

    Thanks & Regards,

     

    Sameer Gautam

     

    Wednesday, July 25, 2007 4:42 AM

Answers

  • I suggest you should have 2 databases:

    1. An OLTP database for writing data (transactions).

    2. An OLAP database for data analysis and rules application.

     

    Database synchronization should be done using SQL 2005 mirroring asynchronously so as not to affect performance.

     

    Since the rules are applied as a batch at night and the rules only change like twice a year  and a report is sent the next day, I would create a scheduled job to sun at night.

    The job could either run a SQL stored procedure that checks the data and creates a report. OR you could get Biztalk to apply the rules as the transactions are recorded to the database (OLAP).

     

    I hope this helps you, please provide me with your feedback and comments.
    Wednesday, July 25, 2007 7:20 PM
  •  

    If the rules are as simple as in the initial threads:

    1. You would just have the SSIS pull out the releveant data (e.g. Just the deposits that are greater than a certain sum). 2.You would them move this data to denormalized tables in a star schema in the OLAP database.

    3.The cubes would synchronise with the data.

    4. The reports built on the cubes would show the suspicious accounts depending on the measures specified.

     

    The time taken would be mostly for Step 1 and Step 3. I don't see that if we start  with 1 million transactions, the data pulled from the OLTP, if properly filtered will greatly reduce this number (because you will filter out withdrawals and small deposits).

    The actual amount of time it will take to do the analysis will depend on the server specs and table indexing. But I think the solution mentioned is about as optimized as possible. If this is all done overnight,and you have a run of the mill server doing the analysis, I do not see a problem.

    Monday, July 30, 2007 2:22 PM
  •  Sameer Gautam wrote:

    Hello Guys,

     

    I would be thankful if could respond my following queries/doubts:

     

    1. What should be the ideal Datatype for a primary key column ? 

       Should we go for Auto-increment Int or unique identifier GUID datatype for our OLTP database considering    performance,clustering support,Database Mirroring,online indexing service.

     

    2. Is OLTP database is just like Northwind or any other DB that we create from SQL Server Management  Studio/Enterprise Manager Databases ?

     

    3. What's the diffrence between Datawarehouse and OLAP DB.Are they same ?.

        Is it true when SQL Analysis Services are applied on a datawarehouse it is called OLAP DB ? 

       

    4.What's the procedure to create a Datawarehouse & OLAP DB ?

     

    Thanks

    S.Gautam  

     

    Question1:

    I think GUID would be great if possible (specially if the user will not view this ID).

    If you need tp make the ID human readable yet Unique then Int and Autoincrement is the way to go, but for Mirroring purposes you will need to make it a composite key of 2 things:

    1. Server ID

    2. Autoincrement ID

     

    Question 2:

    Yes that is true.

     

    Question 3 and 4:

    Here is a great link that explains the difference in schema between normal relational and star schema needed for OLAP. This will explain the very basic principles

    http://www.ciobriefings.com/whitepapers/StarSchema.asp

     

    Also listen to this web cast by Ron Jacobs:

    http://channel9.msdn.com/ShowPost.aspx?PostID=164350

     

    I hope this was helpful, please tell me if you need more info

    Thursday, August 2, 2007 5:24 PM

All replies

  • wow thats a question and a half Smile

     

    i think you need to do some analysis work around your data and the business problem you are trying to solve before thinking about implementation. Some questions I have:

     

      1. Is there a finite number of rules?

      2. Are new rules identified often and if so how quickly are they to be implemented?

      3. Do existing rules ever need modification?

      4. How accurate do the rules have to be, as in if an account is identified as being suspicous and it is not what will the effect be?

      5. Are there any performance constraints or is it purely you have to process all new data daily?

      6. Are the rules to run daily to produce reports are they design to be run ad-hoc?

      7. Are all the rules to be customer centric or are there every rules to be run across several accounts or across transactions?

      8. Are you modifying any of the data?

      9. Is all the data loaded daily, as in the system is 'flushed' and the reloaded before running?

      10. How much historical data are you required to store?

     

    My initial thoughts would be that you require a read-only data respository that allows you to efficiently search and read as required and then run a set of known rules over the data - which makes me think you require some kind of data warehouse with the rules possibly implemented outside of the database in a dedicated application developed in .Net but that depends on the questions above.

     

    HTH

     

    Ollie Riches

    Wednesday, July 25, 2007 9:01 AM
  • Thanks Ollie for your reply. Well we already invested 6-7 months on analysing this business problem.Might You got this feeling because i could not explain entire application into the thread in a summarised   manner. My answers are given in bold against each of your questions.

     

     

      1. Is there a finite number of rules?

         Ans: No rules would be added to system as per country regulators requirement as well as bank's compliance department.Some rules may be based on simple calculations and some based on Data  Mining where customer current transactional behaviour is compared against previous behavoiur.For example customer average withdrawl is 10 K dollars.if he withdraws 90 K dollars that system should generate alert.These would be implemented by us in to system as a update pack not buidable by the client in application.

     

      2. Are new rules identified often and if so how quickly are they to be implemented?

         Ans:One/Twice  in Year new rules are identified and it must b implemented in 3-4 months.

     

      3. Do existing rules ever need modification?

        Ans:No rule's logic would not be modified.All rules would be parameters based so input parameter would be defined by application admin.

     

      4. How accurate do the rules have to be, as in if an account is identified as being suspicous and it is not what will the effect be?

        Ans: 100 % otherwise bank would be penalised.System must raise alert as soon as any suspicious activity happens into the system.

     

      5. Are there any performance constraints or is it purely you have to process all new data daily?

        Ans:Processing would be on daily evening after importing all the data.System must throw its output by next morning.Because 3 days are given to bank for reporting.

     

      6. Are the rules to run daily to produce reports are they design to be run ad-hoc?

       Ans: Daily.

     

      7. Are all the rules to be customer centric or are there every rules to be run across several accounts or across transactions?

        Ans:Both type of ruleset would be into the system. For example if a large remittance is transfered in account A and on the  same day it moves to account B,C,D..so on.then system should be able to track this interrelated transactions.

     

      8. Are you modifying any of the data?

         Ans:No.

     

      9. Is all the data loaded daily, as in the system is 'flushed' and the reloaded before running?

        Ans:Yes.i am wondering If data is flushed then how analysis services would store customer behavious into OLAP fact tables.

     

      10. How much historical data are you required to store?

           Ans:Maximum up to 1 Yr.

     

    Thanks

    S.Gautam

    Wednesday, July 25, 2007 10:40 AM
  • So the end client will not be responible for building new rules, your company will do this and provide an update\hot fix to the client application.

    If this is the case I would start to thinking about implementing a data warehouse application with a custom .Net application on top that has the ability to check for updates on a per client basis, obivously the 'thing' being updated are 'rules'.

    Now whether a rule is a sql scripts, a compiled .Net assembly or part of a BizTalk Orchestration I am unsure about, but all implementations would have advantages and disadvantages - they should be obvious.

    Another area of concern is how the rules will be 'hosted' - as in what application runs the rules Sql Server, BizTalk or .Net application. The reason I ask this is because I presume some rules will take a long time to run, greater than 2 minutes lets say, in which case you want to start the process asynchronously and then allow the end user to check the status of the process as and when required.

     

    You mentioned the use of a web front, so this would indicate to me something like an ASP.Net UI that communicates to the host application, this highlights the fact that you need to have some kind of database to persist state (i.e. results and client settings) so the front end is decoupled from the application running the rules.

     

    HTH

     

    Ollie Riches

     

    Wednesday, July 25, 2007 11:47 AM
  • Currently we have an ASP.NET application with an OLTP Database already installed at client premise.This rule engine would be module of this application,rule engine would be a background process what ever the alerts are generated would be integrated (by web service,database integration) to this web UI for user's review.

    Now i am in a Fix how should we design this component ?. What should be the ideal option (SQL server Jobs/.NET Application/BizTalk etc) to achieve the performance & scalability.Do we need to create an OLTP along with OLAP or we can do this with datawarehouse only.I do not have prior experiance in handling such large applications.

    Would appreciate if you can suggest us how it should be designed.

     

    Thanks & Regards,

    Sameer Gautam

    Wednesday, July 25, 2007 12:48 PM
  • I suggest you should have 2 databases:

    1. An OLTP database for writing data (transactions).

    2. An OLAP database for data analysis and rules application.

     

    Database synchronization should be done using SQL 2005 mirroring asynchronously so as not to affect performance.

     

    Since the rules are applied as a batch at night and the rules only change like twice a year  and a report is sent the next day, I would create a scheduled job to sun at night.

    The job could either run a SQL stored procedure that checks the data and creates a report. OR you could get Biztalk to apply the rules as the transactions are recorded to the database (OLAP).

     

    I hope this helps you, please provide me with your feedback and comments.
    Wednesday, July 25, 2007 7:20 PM
  •  

    It would seem immediately that the choice of a data warehouse of some sort would be a good idea, due to the highly transactional nature of the problem.

     

    How to implement these rules?  Well, it depends on frequency of change, maintenance requirements, etc.  You could set up cubes in OLAP for the rules, but this isn't probably the most maintainable way to do things, tightly couples the rules to the technology, but it would be quick.  There are trade-offs.  You need to be more specific in the order of importance, or the performance requirements, etc.

     

    Difficult question to answer exactly without more information, but I hope this starts the thought process.

     

    Cheers,

     

    Martin.

    Wednesday, July 25, 2007 10:58 PM
  •  

    Thanks dear for your suggestions.I do think this architecture should work.

     

    As per suggestions if design this in following way (Correct me if something is better than that)

     

    1. Suppose we create an OLTP database & an OLAP database.

    2. Daily SQL integration services project (DTS) would load data into OLTP database.

    3. At the time of rule engine execution if particular rule requires some historical figures (like max. debit in last six months) than  it would be fetched from OLAP Facts tables, where all required figures are kept.

    4. After rule engine execution this all transactional data would be transfered from OLTP to OLAP DB for analysis (Data Mining) after analysis services would update fact tables if requires.

    5. OLTP database would be flushed.

    6. If we are to implement rule engine which one is better option SQL Jobs(stored proc),BizTalk Server or.NET   application in terms of performance & scalability.

     

    Once again thanks to all of you for sharing your experiences hope we would be able to achieve an architecture that is highly scalable & high performance.

     

    Thanks & Regards,

    S.Gautam

     

     

     

    Thursday, July 26, 2007 4:54 AM
  • Sameer,

    After consulting with some friends in the BI dept. we came to some conclusions:

     

    1. Your data should be moved with SSIS (not DTS) .

    2. Your rules are applied by defining the appropriate measures (Date/Time, amount/transaction, account number).

    3. The reporting uses these measures to implement the rules you need (and not a stored proc or Biztalk).

    So if the rule is to not have a transaction of > 1 million in 1 day it will work, if over a period of 1 week, it will work also. This will be done by drilling down in the report.

    4.When you need to analyse patterns for client transactions, that is where datamining comes in and this is done using the cube with the measures you have already created.

     

    I hope this answers your question. If you still have concerns, please communicate them and I  will be glad to help.

    Thursday, July 26, 2007 11:05 AM
  • Reg point number 6, Biztalk server should give the best in terms of performance and scalability as it is running on Rete algorithm and very much scalable.Also using biztalk, you can put rules in the best disconnected fashion.

    Reg point number 2, i have a query.Are you planning to load OLTP from OLAP or load OLAP from OLTP.As per my understanding, we put transactions in OLTP and load in OLAP for processing.Correct me if i am wrong.

     

    Cheers

    Thursday, July 26, 2007 12:10 PM
  •  

    Many Thanks you spared some time with BI team to find out the right solution.

     

    As per you conclusions following are my comments against each of them

     

    1.I do agree with SSIS.It must be used for ETL purposes. 

    2.Transaction record contains TransactionId,CustomerId,DateTime,Amount,Type(Debit/Credit) fields.

    3.You mean to say we should go for SQL reporting services to find out the suspicious transactions.

    4.Some of rule set require data mining result which is stored in fact tables ,can reporting services fetch this data for rule execution?.

     

    Now almost you are fully aware what we want to achive ,I would be thank ful if could suggest us a final design on the basis of our & your team discussion.

     

    Thanks & Regards,

    S.Gautam

     

    Thursday, July 26, 2007 12:14 PM
  •  

    Thanks dear for your participation in this interesting thread.

     

    Reg point no.6

     

    I do not have too much knowledge about Biz Talk Server.Can these type of rule sets simple/Mining based can be implemented with Biz talk.

     

    Reg point no.2

     

    You are right we are looking to transfer all transactional data from OLTP database to OLAP for analysis.

     

    Regards,

     

    S.Gautam

    Thursday, July 26, 2007 12:37 PM
  •  Sameer Gautam wrote:

     

    Many Thanks you spared some time with BI team to find out the right solution.

     

    As per you conclusions following are my comments against each of them

     

    1.I do agree with SSIS.It must be used for ETL purposes. 

    2.Transaction record contains TransactionId,CustomerId,DateTime,Amount,Type(Debit/Credit) fields.

    3.You mean to say we should go for SQL reporting services to find out the suspicious transactions.

    4.Some of rule set require data mining result which is stored in fact tables ,can reporting services fetch this data for rule execution?.

     

    Now almost you are fully aware what we want to achive ,I would be thank ful if could suggest us a final design on the basis of our & your team discussion.

     

    Thanks & Regards,

    S.Gautam

     

     

    Sameer,

    here are my answers:

     

    1. Agreed.

    2.Agreed

    3.You could go with SQL Reporting Services or Excel or any client for MS SQL, you find the suspicious transactions by drilling in the report and defining your aggregates.

    4. Data mining is done on top of the cubes, there is a data mining service that comes with MS SQL enterprise and it will be used to for the rule execution. For more details on this here is an excellent resource:

    http://www.sqlserverdatamining.com/DMCommunity/

     

     

    Please tell me if you need any more answers, otherwise please mark as answered if we have answered your question.

    Thursday, July 26, 2007 1:04 PM
  • sorry but BizTalk is never going to give the best performance (out of the box), especially since it is based around a single repository for messages - the 'message box',  which is a sql server db. Yes it can scale when processing orchestration but it suffers from the usual performance constraints for a guaranteed messaging solution.

     

    HTH

     

    Ollie Riches 

     

    Thursday, July 26, 2007 1:21 PM
  • Hi Ollie,

     

    So what do you think how it should be designed to achieve performance & scalability.Can you suggest a design framework as per your understanding.

     

    Thanks,

    S.Gautam

     

    Friday, July 27, 2007 3:43 AM
  • I termed biztalk performance in terms of its capability of rule processing.

    As per industry standards,rete is the best,though some products are cming with reteIII,but still biztalk rule engine is best.

    Are you judging biztalk of some other attributes other than rules processing,then i might be wrong as i have not much idea about OLAP tools.

    Do OLAP tools also have rule engines inside them for analysis and are based on rete algorithm.

     

     

    Friday, July 27, 2007 5:51 AM
  • Hey Sameer, Ollie

    I just wanna say, the answer I suggested finally didn't compose of Biztalk at all, because the rules would all be applied on the OLAP database not on the OLTP database.

    The straight forward rules (deposits as a function of time) will be applied using the measures created in the cube and the report will let you drill down to the time limits you want (daily, weekly, monthly ) and the agregates will show which depoists exceed the allowed sum.

    As for analyzing behavioural patterns (deposit increases) this will be done withthe datamining services.

     

    The solution above will be scalable because you write to one database (OLTP), and you analyse the OLAP cubes and apply the datamining analyes on the other database, possibly on another server overnight.

     

    Please tell me if you have any issues with this solution, or if I missed something.

     

    Friday, July 27, 2007 3:55 PM
  •  

     

    Hello Dear,

     

    Your last statement has created some confusion because we reached at decision that there would be two DB ,one OLTP & one OLAP.Here you mentioned that datamining would be done on another database at another server overnight. 

     

    Could you please elaborate more on this.

     

    Thanks

    S.Gautam

    Monday, July 30, 2007 4:20 AM
  • Sameer,

    You misunderstood my last post, I meant that data mining would be performed on the same server as the OLAP database. What I said was to have 2 DBs, one for OLAP(data warehousing and data mining), the other is OLTP.

    So the data mining and the warehousing will both be on the same OLAP database.

     

    I hope this is clear.

    Monday, July 30, 2007 10:22 AM
  •  

    Thanks

     

    It's Clear now.

     

    What do you think about current framework performance & scalability.I know it depends on rulesets logic & design.Suppose rules are as simple as it is mentioned in initial threads,how much time it might take to process 1 million transactions daily for suspicious act detection using100 rulesets along with fact table updation after analysis.

     

    Thanks

    S.Gautam

    Monday, July 30, 2007 11:14 AM
  •  

    If the rules are as simple as in the initial threads:

    1. You would just have the SSIS pull out the releveant data (e.g. Just the deposits that are greater than a certain sum). 2.You would them move this data to denormalized tables in a star schema in the OLAP database.

    3.The cubes would synchronise with the data.

    4. The reports built on the cubes would show the suspicious accounts depending on the measures specified.

     

    The time taken would be mostly for Step 1 and Step 3. I don't see that if we start  with 1 million transactions, the data pulled from the OLTP, if properly filtered will greatly reduce this number (because you will filter out withdrawals and small deposits).

    The actual amount of time it will take to do the analysis will depend on the server specs and table indexing. But I think the solution mentioned is about as optimized as possible. If this is all done overnight,and you have a run of the mill server doing the analysis, I do not see a problem.

    Monday, July 30, 2007 2:22 PM
  • Hello Guys,

     

    I would be thankful if could respond my following queries/doubts:

     

    1. What should be the ideal Datatype for a primary key column ? 

       Should we go for Auto-increment Int or unique identifier GUID datatype for our OLTP database considering    performance,clustering support,Database Mirroring,online indexing service.

     

    2. Is OLTP database is just like Northwind or any other DB that we create from SQL Server Management  Studio/Enterprise Manager Databases ?

     

    3. What's the diffrence between Datawarehouse and OLAP DB.Are they same ?.

        Is it true when SQL Analysis Services are applied on a datawarehouse it is called OLAP DB ? 

       

    4.What's the procedure to create a Datawarehouse & OLAP DB ?

     

    Thanks

    S.Gautam  

    Wednesday, August 1, 2007 11:42 AM
  •  Sameer Gautam wrote:

    Hello Guys,

     

    I would be thankful if could respond my following queries/doubts:

     

    1. What should be the ideal Datatype for a primary key column ? 

       Should we go for Auto-increment Int or unique identifier GUID datatype for our OLTP database considering    performance,clustering support,Database Mirroring,online indexing service.

     

    2. Is OLTP database is just like Northwind or any other DB that we create from SQL Server Management  Studio/Enterprise Manager Databases ?

     

    3. What's the diffrence between Datawarehouse and OLAP DB.Are they same ?.

        Is it true when SQL Analysis Services are applied on a datawarehouse it is called OLAP DB ? 

       

    4.What's the procedure to create a Datawarehouse & OLAP DB ?

     

    Thanks

    S.Gautam  

     

    Question1:

    I think GUID would be great if possible (specially if the user will not view this ID).

    If you need tp make the ID human readable yet Unique then Int and Autoincrement is the way to go, but for Mirroring purposes you will need to make it a composite key of 2 things:

    1. Server ID

    2. Autoincrement ID

     

    Question 2:

    Yes that is true.

     

    Question 3 and 4:

    Here is a great link that explains the difference in schema between normal relational and star schema needed for OLAP. This will explain the very basic principles

    http://www.ciobriefings.com/whitepapers/StarSchema.asp

     

    Also listen to this web cast by Ron Jacobs:

    http://channel9.msdn.com/ShowPost.aspx?PostID=164350

     

    I hope this was helpful, please tell me if you need more info

    Thursday, August 2, 2007 5:24 PM
  • If you are looking for info on GUID's versus Integers for primary keys check out the following article, it applies to sql server 2000 but provides some good info about the topic.

     

    http://www.informit.com/articles/article.asp?p=25862&rl=1

     

    HTH

     

    Ollie Riches

    Thursday, August 2, 2007 6:59 PM
  • Hi All,

     

    As per our discussion so far you may have noticed that most our tasks are Analysis intensive.

    ETL processes would import customer data as well as their transactional data from various sources like Banking solution,Credit Card,Insurance application. So we need to create a Datawarehouse for that.

     

    I am bit confused about datawarehouse & an OLTP database in SQL Server.Correct me if i wrong if we design an OLTP database in Denormalized form for analysis purpose that is called Datawarehouse.Because in SQL 2005 i have not found any interface where Datawarehouse can be created.

     

    I think it is created just like northwind in SQL Server Management Studio's Database Engine and if it's tables are designed in denormalized form to cater Analytics that is called Datawarehouse.

     

    Would appreciate if you could let me know how datawarehuse is created in SQL Server 2005.

     

    Thanks

    S.Gautam

    Monday, August 20, 2007 4:43 AM
  • Sameer,

    What you wrote is exactly how I have seen it done by the BI guys. A datawarehouse is just like any other database, but denormalized and set in a star schema. So creating one in SQL 2005 is the same as any other database in SQL 2005, you just need to denormalize your tables and design them in a star schema.

    Monday, August 20, 2007 6:46 AM
  •  

    Thanks for your reply.

     

    One thing i would also like to clear about Analysis Services project.

    In SQL 2005 OLAP project's Data Sources we build connection with Datawarehouse DB to do some analysis work.

    We create diffrent Cubes there according to analysis need.

     

    Does cube store data physically or it only shows data in a dimensional view after retrieving from Datawarehouse.

    If this fetches data datawarehouse so whenever datawarehouse is updated or new data is loaded cube would be also refreshed.

     

    Can we build our application that uses only Datawarehouse not an OLTP because most of the work is analytics/rule based we need asp.net presentation to provide alerts to users that are investigated by rule engine.

     

    Thanks

    S.Gautam

    Monday, August 20, 2007 7:25 AM
  •  

     

     Sameer Gautam wrote:

    Does cube store data physically or it only shows data in a dimensional view after retrieving from Data warehouse.

    If this fetches data data warehouse so whenever data warehouse is updated or new data is loaded cube would be also refreshed.

    Yes they do for performance reasons, and so you can perform powerful analysis without crippling your database.

     

    [quote user="Sameer Gautam]

    Can we build our application that uses only Data warehouse not an OLTP because most of the work is analytics/rule based we need asp.net presentation to provide alerts to users that are investigated by rule engine.

     

    Absolutely not, OLTP is optimized for writing data while OLAP is optimized for reading and analysis.

    I cannot think of any case where you can skip writing to the OLTP first, this is a major recipe for bad design.

    Monday, August 20, 2007 10:28 AM
  • I do agree that fast data writing is possible only through OLTP.But in our application all the required transaction would be imported by ETL package from different sources & application would perform analytics on that, here transactions would not updated or modified by the application they are only analyzed.

     

    We would provide Insert/update functionality in our application for only customer master where it is required. Otherwise it would also be imported from other financial applications.

     

    So to manage application login/access permissions as well as alerts presentation to user we can create an small OLTP but all customer data & transaction would be kept in Datawarehouse.Suggest us the right approach to achieve it efficiently.

     

     

    Regards,

    S.Gautam

     

     

    Monday, August 20, 2007 11:15 AM
  • I now understand, I see nothing wrong with having one table in the database containing the cubes for very light dataentry/updates. A good idea would be to put the customers table in a separate schema if you are using SQL 2005.

    So it would be OLTP.Customers.

     

    Please tell me if you need any more answers.

    Monday, August 20, 2007 10:51 PM
  • Thanks for your reply.

     

    I could not understand what you mean by separate schema,could you please elaborate it more.

     

    We are using SQL Server 2005 for our application.Whenever customer data would be updated do we need to process the cube again to get the updated values,or Cubes are just like Database views that shows live data everytime.

     

    Thanks

    S.Gautam

    Tuesday, August 21, 2007 3:58 AM
  • By separate schema I mean taking advantage of this new feature in SQL 2005 where you can separate your tables into schemas, whcih can make management easier.

    This way instead of dbo.customers it would be OLTP.Customers. A better example is having 2 tables called "Customers" in the same database but one owned by the Sales department (Sales.Customers), the others owned by the Marketing department (Marketing.Customers). This way you could manage the 2 tables separately.

     

    You can find more information on schemas here.

    By the way, schemas are not imperative for the solution, I just suggested using them to separate the OLTP tables from teh OLAP tables so they can be managed separately.

     

    I hope this was helpful.

     

    Tuesday, August 21, 2007 1:40 PM
  •  

    Should we have two tables for customer one OLTP Nature to present data in application and another one in denormalized nature for analytics.

     

    So if any customer profile is changed latest value would be in OLTP.Customer Table and new customer record with datatime stamp would be inserted into OLAP.Customer Table.

     

     

     

    Wednesday, August 22, 2007 4:00 AM
  • I definitely think this is the way to go, and synchronize as needed e.g. nightly. This will make your life a lot easier, because you can easily insert/update data in the OLTP.Customers table, and have your OLAP.Customers dedicated to analysis. 

     

     

    Wednesday, August 22, 2007 5:59 PM
  • Thanks for your constant reply.

     

    So We will have OLTP.Customer table for customer Master and OLTP.Addresses for handling multiple Address.

    For analytics we have OLAP.Customer Table in denormalized form as well as OLAP.Address.

     

    Suppose a customer profile is updated it would be done in OLTP.Customer table instantly to reflect the changes into front end application.What would happen when synchronization would take place in OLAP.Customer Table,

    a new would be inserted with datetimekey with updated value or existing record would be modified because our application is analysis centric so we can not afford to loose any change history.

     

    If a new record are inserted in OLAP.Customer table how we can get this in the main record for which we are to do some analytics.For analytics we would require the uptodate record.

    There is another approach i have read about DW suppose a attribute Designation in OLAP.Customer is changed to 'Sr Developer' then it would be inserted in another fact table Designation with date time key.As per this methodology for each changing attribute there would be a fact table.

     

     

    Thanks & Regards,

    S.Gautam

     

     

    Thursday, August 23, 2007 4:15 AM
  • I just want to clarify some things, when you denormalize you would not normally still have an Address table and a Customers table, rather you would have a table containing both.

     

    Moving the data between the OLTP and the OLAP databases is done with a scheduled job using SSIS most probably, this can be scheduled as needed (daily, weekly, monthly)

     

    If this doesn't answer your question, please clarify more. 

     

    Thursday, August 23, 2007 4:08 PM
  • Thanks very much for your reply.

     

    In our Application address is defined in 6 fields (Street,city,State,Country,Pincode,ContactNo) so if we are to maintain 3 type of addresses like Office,Communication,Residence then we have to add 18 fields in our Customer Master dimension table in our dataware house.

     

    Suppose after some time customer office address is changed so what should be done in Customer Master Table at time of synchronization, Should ETL process insert a new record into dimension table with updated address in office address fields or it should update the existing record fields without adding new one.

     

    If for Changing dimensions we are adding a new record then we have to maintain a flag field where status of record would be kept like 'current record' or 'old record' , this would help us to identify the main record.Every time whenever a new customer record is added old record status is changed to 'Old record'.

     

    Should we add new record for changing fields like address,designation,employer,annual income etc or update the existing record only.Because in our analysis these all fields would play major because after each profile updation a risk category is assigned to customer.So we can not afford to overwrite any field information because at any time old information can be asked by regulator.

     

    If every time for each field updation a new record is updated then customer master datawarehouse table would become 5 times if each customer record is updated five times during the year.

     

    Suggest us the efficient approach to handle changing dimensions.

     

    Regards,

    S.Gautam 

      

    Friday, August 24, 2007 4:23 AM
  • Sameer,

    I really want to help you solvethsi problem, but unfortunately this is beyond my personal experience, and I believe that this question would better be asked in the SQL Data warehousing forum here.

    I think this is the best place to ask about de-normalization and keeping history data.

     

    I hope this helps, and please tell tell us how they helped you with this specific problem, I am personally very interested and eager to know.

    Saturday, August 25, 2007 7:28 PM
  •  Dear Sameer,

    After consulting with my friends in BI, we think the best solution for your problem is what they call a Slowly Changing Dimension.

     

    There are 3 types of SCDs:

    1. Type 1 SCDs overwrite historical information by updating the changed column, so you basically loose your history.

     

    2. Type 2 SCDs, are the most common form and they are used to track historical attribute values

    An example give to me was as follows:

    Addess information is used to aggregate sales measures by geography changed by a customer.

     

    The solution to this would be:

    Join the dimension table to the fact table by using a surrgogate key and add "Start Date" and "End Date" information to rows.

     

    3. Type 2 SCDs store the only the Current Version and the Original version, so nothing is stored in between, this will save you a lot of disk space, but you loose the information in between, which I think does not fit your requirement.

    How it is done: There are new attributes for the original value and the effective date in each row.

     

     

    A fourth type is called Rapidly Changing Dimensions (but you would only need that if the changes are more than once a day)

     

    How this is done:

     Create a table that stores the possible values for the changing attribute and that has a unique-key column. You then use this key value to relate the RCD values table to both the original dimension table and the fact table. This works particularly well if there is a finite list of possible values for the changing attribute

     

     

    In the end I think you should either use Type2 or Type4 depending on your requirements.

     

     

    Please keep me updated, and I definitley hope this answers your question. 

     

    Sunday, August 26, 2007 4:41 PM
  • Hello Dear,

     

    Thanks very much for your detailed reply.I did some reading on SCD last week & realized Type2 is best suited according to our requirement.

     

    I would escalate this thread to suggested datawarehouse fourm for further inputs & clarifications.

     

    You are the only one who is still showing interest in this thread,once again thanks very much for your great efforts & valuable inputs.

     

    Regards,

    S.Gautam

     

     

    Wednesday, August 29, 2007 4:11 AM
  • Hi,

     

    As per your suggestion i escalated this thread to Data Warehouse forum.

     

    Following is the link for the same

     

    http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2069049&SiteID=1&mode=1

     

    Your are most welcome to put your comments & suggestions there.

     

    Thanks & Regards,

    S.Gautam

     

    Wednesday, August 29, 2007 10:31 AM
  •  

    Hi All,

     

    We are in the process of designing customer master dimension table for our data warehouse. According to customer's occupational status different attributes are captured in our application for example.

     

    Attributes If Occupational Status is Salaried
    Type of Employer Organization
    Nature of Duties
    Designation
    Department
    Employer Business Nature
    Employer Industry
    Employer Address
    Employer Size of Organization
    Employment Type
    No of Yrs in Current Employment
    Name of Previous Employer
    Previous Employment Type
    No of Yrs in Previous Employment

     

    Attributes If Occupational Status is Businessmen

    Business Firm Type
    Nature of Business
    Industry of Business
    Date of Incorporation
    No. of Employees
    No of Yrs in Current Business
    Previous Business Nature
    Previous Business Industry

    No of Yrs in Previous Business

    Attributes If Occupational Status is Professional
    Industry of Profession
    Nature of Profession
    No of Yrs in Current Profession
    Previous Profession Nature
    Previous Profession Industry
    No of Yrs in Previous Profession

     

    Should i create columns for each attibutes in my customer dimension table or it should be kept in three dimension tables for each occupational status.

     

    Please suggest me the ideal approach to handle this efficiently.

     

    Thanks

    S.Gautam

     

    Thursday, August 30, 2007 11:23 AM
  • im doing work on data mining in credit card fraud detection in which i will layout the fraudulent transactions....for that i have to use an algo that i can implement on data, can anyone help me.....thanx
    Tuesday, June 9, 2009 5:18 PM