none
should data error validation be a part of the architecture of a software? RRS feed

  • Question

  • Hi guys,

     

    software architecture makes sure the architectural attributes usually.

     

    scenario:

    A hypothetical system, accepts data in various formats like text, csv, xl etc. The system' responsibility is to parse and store the data and use a calc engine and display the reports

     

    The system runs on the assuption that the input data is correct.

     

    There can be n-number of problems in the data. Is it the responsibility of the application architecture to provide a infrastructure for validating the data or it's a implementation issue.

     

    pls. suggest if you know any standard architecture or design to be followed to handle this issue.

     

    hoping to get a reply soon.

     

    thanks & regards

     

    Tuesday, May 8, 2007 7:16 AM

Answers

  • I think that it's more than just pure "application" architecture.
    Currently you directly map your business requirements to the technical implementation, but there is one thing missed - conceptual design.

    You should abstract from the implementation right now and consider how your system manipulate with incoming data, how tread them and use, interact with other systems.
    You should deliberate the semantic of your data to bring consistency and flexibility to add new formats in future.

    To sum what I've described the best approach in your cases will be creating the canonical data format, which will be used by system for their internal task, and convert all incoming formats to this system-canonical one

    Tuesday, May 8, 2007 8:42 AM

All replies

  • I think that it's more than just pure "application" architecture.
    Currently you directly map your business requirements to the technical implementation, but there is one thing missed - conceptual design.

    You should abstract from the implementation right now and consider how your system manipulate with incoming data, how tread them and use, interact with other systems.
    You should deliberate the semantic of your data to bring consistency and flexibility to add new formats in future.

    To sum what I've described the best approach in your cases will be creating the canonical data format, which will be used by system for their internal task, and convert all incoming formats to this system-canonical one

    Tuesday, May 8, 2007 8:42 AM

  • Is it the responsibility of the application architecture to provide a infrastructure for validating the data or it's a implementation issue

    You can’t, knowing that you have no control over the input, assume that input data is correct. If the stakeholders of your envisioned architecture agree with this assumption then the answer would be no, it’s not an infrastructure nor implementation issue.

    On the other hand if your system has a requirement that states 95% straight through processing, then you have an architecture significant requirement. This influences the envisioned architecture. This could mean that in the logical architecture you would define a validation component for the system. In the physical architecture this could be implemented with a technical component (for instance Enterprise Library Validation Block).

    Tuesday, May 8, 2007 11:05 AM
  •  

    Thanks for your reply.

     

    pls. correct me if i am wrong,

     

    say i define a xml and an xsd for the input data. and my system will understand that format.

     

    but i cannot enforce the user to give the data which confirms to that xml format.

    should the architecure be limited to defining the canonical format i.e. xml or the architecture must provide the mechanism for transforming the  flat file data( could be in any form) to xml.

    Tuesday, May 8, 2007 1:08 PM
  •  

    paul,

     

    This is a very data centric application. so i guess "correctness of data" in a feed can be considered as a architecture significant requirement.

    Tuesday, May 8, 2007 1:11 PM
  •  nsgashis wrote:

    Thanks for your reply.

    pls. correct me if i am wrong,

    say i define a xml and an xsd for the input data. and my system will understand that format.

    but i cannot enforce the user to give the data which confirms to that xml format.

    should the architecure be limited to defining the canonical format i.e. xml or the architecture must provide the mechanism for transforming the flat file data( could be in any form) to xml.



    Yep, you need to provide the "service containers", also known as "adapter", for your input data, to convert incoming data to your system format data. Such, you abstract your input files from your system.
    In common, these adapters/service container is just the services (or set of services for each incoming type) with XPath + XSLT, which know how to:
    1) detect the desired format
    2) select the specific XSLT
    3) transform incoming file with the selected XSLT to your system format
    Tuesday, May 8, 2007 1:24 PM
  • Can you tell a bit more about the technology choices you have made for this system. For instance, is the system exposed as a web service which requires the input to be expressed in messages? Is the input structured data?
    Tuesday, May 8, 2007 3:32 PM
  • Paul,

     

    technology is pretty much pre-defined in this project. i.e Microsoft .NET

    ( I don' have much idea on the technology selecion basis for a project...itz a seperate issue, you can throw some light on that incase you have done those sort of things)

     

    inputs are text file with various seperators ( data feed can be another database, files in other formats etc ). the requirement doesn't talk about any other external system or service expecting the data. basically the single application consumes the data and provides the facility to present the data in various meaningful ways.

     

     

    Wednesday, May 9, 2007 6:31 AM
  • I've architected two systems that take flat files in various formats as input, transformed them into XML for further processing using (as Michael Nemtsev said earlier) XSLT. For one solution that required to handle extreme volumes of input in the form of pension prolongations, we wrote a custom parser, validation, message enricher and transformation pipeline tied together through a workflow engine. Technology choices where .NET/C# and Windows Workflow Foundation which turned out great.

    For the other system we had very different requirements. No high volumes and more emphasis on manageability and process transparency. For this scenario we used BizTalk 2004 to parse, validate, transform and route flat file input. Microsoft has published numerous articles explaining how to build this kind of solution with BizTalk 2004. I'd advise you to look at BizTalk 2006 since it features enhanced support for handling flat files (see this article explaining the flat file wizard).

    I'm sorry I can't provide you with a canned solution to your problem without diving deep into the requirements. My goal with this information is to give you some direction what to look for.
    Wednesday, May 9, 2007 9:10 PM
  • Yep, as Paul notes, the BizTalk incapsulates the first scenario (with XSLT), providing number of adapters/service container (the principles were desribed above) for different formats, wich can be configurated easily using WYSIWYG editor.

    The only problem that BizTalk is fat middleware with number of features, and if you only need small services to convert files the BizTalk may be not good solution

     

    everything depends on requirements

    Wednesday, May 9, 2007 9:25 PM
  • Thank you guys for all the valuable information and the pointers.

     

    If you talk about the volume of the data, around  one lakh i.e. 100 thousand records comes in and each record has 50 fields. frequency is once everymonth. it is important to validate and publish the errors upfront.

    Thursday, May 10, 2007 5:10 AM
  • What is the window for processing these messages preferably a breakdown to n/sec numbers.
    Thursday, May 10, 2007 5:34 AM
  •  

    guess i am not getting your question.

    It's  not a online system, processing can be done offline. All the datafiles can be received in a batch and can be processed later batchwise.

    Thursday, May 10, 2007 9:13 AM
  • Perfect scenario for a BizTalk based solution.
    Thursday, May 10, 2007 10:28 AM
  • Alternatively, you can also take a look at SSIS(SQL Server Integration Services). It might suit your needs.
    Thursday, May 10, 2007 9:43 PM
  • Hadn't thought of this. For the earlier mentioned projects SSIS wasn't available at that time.

    SSIS makes it possible to load a SQL Server database from flat files and is optimized for bulk data loading. See these interesting customer case studies.

    There's an interesting aspect to the architecture when taking the SSIS direction. In your initial post you also mentioned the need for a calc engine and to display reports. Also consider an SQL Server Reporting Services implementation for these two components. In the book "The Microsoft Data Warehouse Toolkit" which gives a nice overview of the mentioned SQL Server 2005 products. Don't fall into the trap of building a full blown business intelligence platform without making absolutely sure that is what your business needs.

    I cannot stress enough that choosing the appropriate solution/direction depend entirely on your requirements and governance within your company.
    Friday, May 11, 2007 7:36 AM
  • Cheers!

     

    I've read all the responses and I am pleased to see professionals here.

    Nonetheless, I've seen every attempt to break SOA rules, as well as many attempts to breal the laws of ADL (Architecture Definition Language) on a level of a plumber.

     

    First of all, nsgashis...
      You've stated "The system runs on the assuption that the input data is correct."

    That was a sacramental sentence that had initially put you into the bounderies of a schema. Meaning that that was an agreement between a source and a target for understanding. [Sorry, as a true architect I mostly speak in a pure abstraction.]

    There can be no way around other than a common set of metadata.

     

     It is YOUR responsibility as an architect to protect your application from a failure, and however you have to define YOUR way for implementers to intercept an erroneous data. Meaning that you have to reconsider your application as a larger transaction and other processes within as nested/internal transactions, for which exception-handling should be done in uniform approach regardless whether a failure was on a macro- or a micro- level.

     

    ....sorry gotta go now...

     

    These are 4 canonical principles of SOA:

    - service boundaries are explicit

    - servivces are autonomous with specific functions of their own

    - presence and sharing of metadata and schemas/messages

    - compatibility policy (services use policies, such as publish-and-subsribe, plug-and-play, etc...)

     

    Adios, cheers, regards, etc...

     

     

     

    Saturday, May 12, 2007 6:34 AM
  • Thanks you guys for all the information exchange.

     

    Importing data was one of the components of my application.

     

    i have investigated DTS and made a DTS modeling for my requirements. and used the DTS object model.

     

    BizTalk server also provides the same facility even more, but there were various constraints & parameters in choosing the solution for this project.

     

    i choose DTS.

     

    thanks you all once again.

     

     

    Friday, June 29, 2007 7:25 AM