locked
Can BizTalk or C# check whether a incoming file has been processed or not? RRS feed

  • Question

  • I have a case, which I need to read a transaction file from the bank, but I want to avoid duplication if bank sends the same file twice. So currently I use the file name as a filter to check whether the specific file name has been processed today.

    Is there any better way to handle this? Because if the bank sends half of the data first, then sends the rest of the data with the same file name to BizTalk application, then I will have trouble with this.

    Thank you.
    Wednesday, September 30, 2009 4:15 AM

Answers

  • Hi Xiao,

    If you are using a BizTalk solution then definitely you should follow the architecture and use SQL Adapter for inserting records. One advantage you can get is to insert multiple records at once using updategrams. It is also a good practice and recommendation to use SQL Adapter for BizTalk solutions? Any specific thing which you are wondering about?
    Abdul Rafay http://abdulrafaysbiztalk.wordpress.com/ Please mark this as answer if it helps
    Tuesday, October 6, 2009 10:02 AM
  • My recommendation is: every time you receive a file from the bank, check your data store(is it a database?) to see if it's a new file or part of an existing one or a duplicate one. This should be the common approach to avoid duplication.

    Thanks.


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Tuesday, October 6, 2009 12:51 PM
    Moderator

All replies

  • we have one scenario in our project we make one componet duplicatefilename checker;

    for that we are storing filename inside database with timestamp while file comes for processing;
    we are calling on storedProc which will check duplicate file at recieve pipeline level before processing;
    hope this way you can solve your isssue
    Wednesday, September 30, 2009 4:42 AM
  • Thank you Singh,

    That could be one option.

    Would I be able to use MD5 kind of value to identify the file contents without reading it?

    Wednesday, September 30, 2009 4:56 AM
  • hi Xiao

    can you explain about  MD5 kind of value to identify the file contents without reading it?
    you have to chake file content also????
    Wednesday, September 30, 2009 6:19 AM
  • You can go ahead with your current implementation and along with it have one field in schema, which will say current message is new or update. So when bank send the file first time it will be set to say "New" and when bank send the same file 2nd time then set value say "Update". And then use this field along with Filename to filter out the messages.
    Ajeet Kumar
    Wednesday, September 30, 2009 6:51 AM
  • Duplication elimination is more than achievable in almost all cases, but the details do depend on the exact scenario.
    You say that you need to 'de-dup' at the data level, which is certianly the right direction, but you don't state whether there's something in the data you could use to identify the record.

    If you wish to de-dup at the data level you will have to read the data, at least the unique identifier for every record.
    Similarly you will need to either de-batch the data, to allow you to ignore duplicate messges, or filter the message, producing a message containing only non-duplicates; ideally you would be doing this in the receive pipeline (it's pointless to persist the entire message to the message box if you know half of the data is duplicated and therefore not needed)

    I think you are correct being concerned about using the filename- this is only a very partial solution and can easily result in data loss.


    Yossi Dahan http://www.sabratech.co.uk/blogs/yossidahan [To help others please mark replies as answers if you found them helpful]
    Wednesday, September 30, 2009 7:28 AM
    Moderator
  • I have a case, which I need to read a transaction file from the bank, but I want to avoid duplication if bank sends the same file twice. So currently I use the file name as a filter to check whether the specific file name has been processed today.

    Is there any better way to handle this? Because if the bank sends half of the data first, then sends the rest of the data with the same file name to BizTalk application, then I will have trouble with this.

    Thank you.

    We had the same scenario where we had to post transactions from a flat file to the Tuxedo middle ware as TCP messages and handle duplicate transactions check. Now it all depends upon your scenario, we had the flat file created from another system in which there were duplication at the data level. If the file owners give you the assurity that they will not have data duplication but have only batch duplication then you will only need to handle duplicate messages. You can identify duplicate messages with batch id if your transaction file has one or with the transaction file name etc.

    If you have to check duplication at data level then I would recommend to to insert the batch into an SQL Server table using SQL Adapter (use bulk inserts by Datagrams). First of all you should uniquely identify the record with a primary key or the records may be identified via a composite key. Then you can have a select query having distinct keyword so that no data duplication is done within the batch too. One more thing if you have the table designed correctly i.e. they have the columns marked as unique which identifies your transaction as unique then this will ensure that duplicate transactions are not inserted.

    Again this will depend upon your requirement that a same record should not be inserted on the same day or can be inserted the other day and will not be treated as duplicate. We only had track of transactions for one business day then the other day we had the table being truncated and repeated the same process again.
    Abdul Rafay http://abdulrafaysbiztalk.wordpress.com/ Please mark this answer if it helps
    Wednesday, September 30, 2009 9:12 AM
  • Thank you singh.

                         As per singh's view I think it will help.

    Thanks
    Uttam


    Thanks & Regards Uttam
    Thursday, October 1, 2009 5:02 AM
  • As all the files contents are different, so in this case, if I use the MD5 checksum value as a identifier to distinguish files with different contents that would be helpful.

    Finally I decide to add another id to the file name, so even for the same day transaction file is split into two parts, I still can use the file name to figure out which one is which one.

    But I would say, MD5 value is a good option too.
    Monday, October 5, 2009 9:24 PM
  • hi Abdul,

    I am wondering the performance cost of inserting the contents of the file into SQL database using SQL adapter?
    Tuesday, October 6, 2009 3:02 AM
  • Hi Xiao,

    If you are using a BizTalk solution then definitely you should follow the architecture and use SQL Adapter for inserting records. One advantage you can get is to insert multiple records at once using updategrams. It is also a good practice and recommendation to use SQL Adapter for BizTalk solutions? Any specific thing which you are wondering about?
    Abdul Rafay http://abdulrafaysbiztalk.wordpress.com/ Please mark this as answer if it helps
    Tuesday, October 6, 2009 10:02 AM
  • My recommendation is: every time you receive a file from the bank, check your data store(is it a database?) to see if it's a new file or part of an existing one or a duplicate one. This should be the common approach to avoid duplication.

    Thanks.


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Tuesday, October 6, 2009 12:51 PM
    Moderator