EDI Document parsing in R2 RRS feed

  • Question

  • Dear BizTalk Gurus,


    Is there a significant difference in the way BizTalk 2006 R2 parses EDI documents (claims in my case) as opposed to its predecessors? I come from the BizTalk 2002 world. So I don't know how things are in 2004 and 2006. In 2002 for eg. if you drop an 837 file with multiple STs in it, you could immediately see claims in the queues. I am all most certain that BizTalk 2002 parses and process one ST at a time. For testing purposes, I configured Biz 2002 to write output documents to file. So when I dropped the 837 with multiple STs, I did see output xml files in the output directory immediately and the count went as BizTalk processed the claims. I have a similar setup in R2 (pass through transmit) and I dropped the same 837 to the pickup directory. I did see the claims under the "Running" instances from the Group Hub interface. But to my suprise I didn;t see any documents in the output folder untill Biztalk finished processing the claims. Is this how R2 handles edi documents? or is there something I need to configure inorder to achieve the same as in 2002. You could imagine the situation with claim files with 50K to 100K claims in it.


    I really liked the way 2002 used to handle EDI documents. We have three dedicated servers to process EDI claims. One handles the parsing and the other two handles processing. The server dedicated for parsing parses the claims and moves them to the shared queues immediately. The processing serves pickup these claims and process them. So the parsing and processing happens in parallel. With R2 I am afraid this is not going to happen and it may even increase the overall processing time. I may be wrong (Hope so). Really appreciate if you could shed some light on this. We are trying to migrate to R2 from 2002 and I am really struggling with R2.



    Wednesday, November 19, 2008 11:51 PM


  • Do the output files have the new namespace?

    If not: since you changed the namespace for this modified schema, you'll have to change the 'Target Namespace' in the EDI properties of the party as well. Without changing that, the old schema will still be used for the translation.
    (parties -> EDI Properties -> X123 Properties -> Party as Interchange Sender -> X12 Interchange Processing -> Enable custom transaction set definitions)


    • Marked as answer by G.Joseph Friday, February 20, 2009 2:24 AM
    Thursday, February 19, 2009 9:20 PM

All replies

  • The behavior you are describing sounds like "debatching" in BizTalk Server 2006 R2. Here is a feature comparison chart for the BizTalk product history going back through 2002 which is helpful here: R2 does support inbound debatching.


    It sounds like in your case the party may be configured to preserve interchange (see the EDI properties of your party) which prevents the single file from being debatched.



    Thursday, November 20, 2008 12:43 PM
  • Ben,


    First of all thanks for your reply. There is a correction to my original post. There I had mentioned that I have a Passthrough Transmitt. Actually it is an XmlTransmitt. I have a very simple configuration. I have a send port that subscribes to the receive port using the filter options specified. (BTS.ReceivePortName == Receive Port for WebD x12 837P And

    BTS.MessageType !=


    I checked the EDI Properties of the Party as you suggested. Under "Party as Interchange Sender\Ack Generation and Validation" section the Inbound batch processing option is set to "Split Interchange as Transaction Sets". I tried changing this setting to see what difference it would make. After changing the setting I dropped the 837P file to the pickup directory. The interchange was moved to the Suspended Queue immediately and reported the following error


    A message sent to adapter "FILE" on send port "Send Port for WebD x12 837P" with URI "C:\HIPAA\TEST\WEBD\OUT\Output%MessageID%.xml" is suspended.

    Error details: There was a failure executing the send pipeline: "Microsoft.BizTalk.DefaultPipelines.XMLTransmit, Microsoft.BizTalk.DefaultPipelines, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35" Source: "XML assembler" Send Port: "Send Port for WebD x12 837P" URI: "C:\HIPAA\TEST\WEBD\OUT\Output%MessageID%.xml" Reason: This Assembler cannot retrieve a document specification using this type: "".

    MessageId: {92D58714-9C32-4198-9570-A3AC8F34A14D}

    InstanceID: {352D84A4-CF59-45F6-8C0F-5E36261649E9}


    I dropped a small file with 2000 claims in 300 STs to R2. Its been an hour and so far I don't see anything under "Running" or "Supended" instances from the Group Hub page. I tried to delete the claim file from the pickup directory and got a sharing violation. So BizTalk is defenitely doing something with it. A while back I played around with the Global EDI properties. The only thing I changed was the ISA5-6 and ISA7-8. Somewhere I read if the party resolution fails, R2 will use the settings in the Global EDI properties. I chnaged the values back to BTS-SENDER and BTS-RECEIVER. I believe these were the values by default. Could that be the reason for the severe performance issue I am experiancing?


    Update: After 1 hour and 15 mins, I do see documents under "Running" in the group hub. I hr and 15 mins to parse 2000 claims is awefully a lot of time. Biz 2002 could process around 20,000 claims with the same amount of time. The other strange thing I am seeing is that I do see all 2000 claims in xml format in the Output folder even though there are still 100s of Active Documents in the "Running" instnaces.


    I really appreciate your help on this.




    Thursday, November 20, 2008 5:19 PM
  • I think it seems quite likely that the performance issue you have is the global properties as you expected. When I saw your error message I thought it could be the global properties because the error is showing the default schema namespace for BizTalk EDI messages, which is what the global properties use by default. The global EDI properties get invoked if BizTalk cannot determine which party to route an EDI message to. I am not sure if the ISA/GS headers get processed before or after debatching, but if BizTalk had to try routing, fail, then reapply the global properties, it could explain why you are seeing the performance issue.


    Whenever you change the party properties you should stop the BizTalk application, restart the host instance of your application, and then restart the BizTalk application. The party properties are cached in memory in the host instance. Between tests I would do this to clear out the host instances. I am not sure why you still see so many Running instances after the file had been processed.

    The files are in Xml due to XmlTransmit. If you want them in EDI you need to use EDISend.



    Thursday, November 20, 2008 7:03 PM
  • InterchangeXML schema is not defined in EDI application by default, to solve this you need to deploy a schema having message type '' and define an <Any> node under the root.


    Regarding EDIDisassembler, Please refer the article on how it works... if you haven't checked it already. A lots of activities happens there compared to the one you were using in 2002 version.. May be that would be affecting the perfomance...

    Friday, November 21, 2008 6:44 AM
  • Ben,

    Thanks for your valuable suggestions about the proper way of changing the party config. I wasn't following that. I don't think the issue I am experiancing has anything to do with the Global EDI Settings. I changed the Global EDI target namespace to an invalid namespace. If party resolution was using the Global settings, I should have seen error messages related to an invalid namespace. That didn't happen. Also I verified that the party settings matched the x12 data. So party resoluation was fine



    Thanks for the link regarding the EDI Disassembler. After reading through it, I relaxed some settings and re-processed the file. The performace was still terrible. I think I found the issue for the terrible performance. Before I get into the details, let me explain my configuration first. I created a vb 2005 application with just the 837P Multiple schema and deployed this solution. Configured this application to write out the claims in xml format (input xml) to a folder. So I created a receive port (EDIReceive), receive location (File) and Send Port (XMLTransmit) that subscribed to the receive port based on the filter criteria. I also have a send port to generate 997s.


    Now to the details. The claim file I processed contains 2000 claims in 330 STs. There were few STs with 200 - 300 claims in it. Each ST represent a distinct Billing Provider. So one Billing Providers (2010AA) submitted 200 claims. I am just showing the hierarchy for readability purpses.


    2000AA Billing Provider
     2000B Subscriber Loop
      2300 Claim
     2000B Sunscriber Loop
      2300 Claim
     upto 200 times.
    Earlier I had mentioned that I noticed all 2000 claim xmls in the output folder even though there were active running instances under the "Running" tab in the Group Hub page. These running instances were taking such a long time to process. After close examination I noticed that some the files were 0 bytes when there were active running instances. I went back and checked the file sizes when Biztalk completed the processing. No 0 byte files were found. The xmls were 10 - 40 KB in size. But towards the end I saw 800 - 900 KB files. So opended a random file and located the corresponding Transaction set on the 837 based on the ST02 value from the xml. That was one of the STs with 200 claims. When I checked the xml document, to my surprise I noticed 100+ 2000B elements. I wasn't surprised about the 100+ 2000B Subscriber loop since someone else also reported a similar issue in the forums. Please refer Even though there were 100+ 2000B loops, I couldn't even find a single 2300 CLM element. I was really suprised to see this. All the 200 xml docs from this ST had more than 100+ 2000Bs and no 2300 CLMs. So I am assuming the running instances that were taking for ever produced these xml docs. May the parsing of these STs could be reason why it was taking for ever.


    What is the difference between the 837P single and multiple schemas? My understanding was that the Multiple
    schema was used to split the document into multiple documents. (one document per claim). I could be wrong. The batching option under the Ack Generation and Validation setting in the Party config splits the document into multiple transaction sets, right? If I were process a file with multiple STs with the "Split" option from the Party Settings enabled, I should see one document for each ST, Is that right? On the other hand an 837P Multiple schema splits the STs further down at the claim level (depending on where the break option is set in the 837 schema). Am I following this correctly? Is the multiple schema the reason for screw up with paring? I think there is a bug in the BizTalk 837 parsing process. I will be testing this with the 837P Single schema. If the results are still the same, we know for sure the parsing is messed up. Have you guys ever experianced this issue?
    Really appreciate your thoughts/comments on this.


    Also could you please show me a sample InterchangeXML. schema file?


    Once again thank you for your help Ben and Genuine.



    Friday, November 21, 2008 8:04 PM
  • Multiple schema is indeed used to split a single document (ST) into multiple parts, depending on where the break option is set (e.g. when set on claim level, each claim in the ST will be split into a separate XML message.)

    There was indeed an error in the way BizTalk 2006 R2 was handling this splitting: the split messages wouldn't only include the segments relevant to that one claim, but also all the higher level segments (2000B's) for all other claims.
    This results in 'bloated' XML documents, and generating these too-large-messages impacts performance.

    Microsoft is close to finalizing a hotfix for this issue, which will be available for public download soon.

    However, you also mention that some of your output files contain many 2000B's but not a single 2300 claim. Please note that the 2300 CLM is optional and so 2000B's without CLM are possible! Did you verify that in the input X12 document all the 2000B's DO have a 2300 CLM? If this is the case then the output files without CLM's may be a different issue, maybe specific to your input file. Can you share your input file with me for testing?

    Friday, February 13, 2009 11:59 PM
  • Gerard de Jong - MSFT said:

    However, you also mention that some of your output files contain many 2000B's but not a single 2300 claim. Please note that the 2300 CLM is optional and so 2000B's without CLM are possible! Did you verify that in the input X12 document all the 2000B's DO have a 2300 CLM? If this is the case then the output files without CLM's may be a different issue, maybe specific to your input file. Can you share your input file with me for testing?



    Thanks for the update on the hotfix. I just applied the hotfix and I am testing it. Actually there was one 2000B loop with the 2300 claim in it. My string search didn't find any 2300s for some reason. So no issues with the 2300 claim. That was my mistake.

    Wednesday, February 18, 2009 9:05 PM
  • I applied the hotfix that addresses the splitting issue (KB: 967945) and I am having trouble to get this thing work properly. I have been instructed to make the following setting in the 837P multiple schema

    To enable the new feature, you add the following to an annotation:  Split_Without_Sibling_Data = "Yes".  This should appear in a multiple schema under xs:annotation-> xs:appinfo, just after subdocument_break="yes".

    I added the setting to the schema and published the schema with the change as part of a new Biztalk application. There are several other biztalk applications that use the 837P multiple schema. I didn't tocuh them. Modified the root namespace of the new schema with the new config setting to avoid the conflict and deployed the application. Configured the application to simply write the XML to the file system. Dropped an 837P file and the XML created by the BizTalk process still have multiple occuracnes of the 2000B loop. What am I doing wrong here? Is there any other config setting that I need to set inorder to prevent the multiple 2000B loops? Could someone familiar with the issue please point me in the right direction?


    Thursday, February 19, 2009 8:35 PM
  • Do the output files have the new namespace?

    If not: since you changed the namespace for this modified schema, you'll have to change the 'Target Namespace' in the EDI properties of the party as well. Without changing that, the old schema will still be used for the translation.
    (parties -> EDI Properties -> X123 Properties -> Party as Interchange Sender -> X12 Interchange Processing -> Enable custom transaction set definitions)


    • Marked as answer by G.Joseph Friday, February 20, 2009 2:24 AM
    Thursday, February 19, 2009 9:20 PM
  • Gerard,

    You are the man. That was it.  The documents are splitting correctly. The performance was terrible when I processed claim file with 10,000 claims that spanned across multiple STs. Is there anything that I need to consider to improve the performance? Do you have any recommendations?

    Friday, February 20, 2009 2:23 AM
  • Performance always depends on many things.
    Can you be more specific about how the performance was? Performance should already have improved a lot with the hotfix compared to before the hotfix. Can you confirm this?

    I assume you are comparing performance with BizTalk 2002. Assuming you run BTS2002 and BTS2006R2 in comparable environments, how did the performance compare between the two for a file of this size?

    Friday, February 20, 2009 8:27 AM
  • Yes, there is a significant improvement in performance. I tested with claim with 1000 claims and everything got processed in 20 - 25 mins. Before it used to take hours. So went ahead and dropped the file with 10K+ claims to pickup directory by around 7:00 PM. When I checked this morning there are still 1000 claims in the queue with status set to active. Before I complain about performance let me explain my environment setup.

    My goal is to process the claims through BizTalk and load the resultant data to oracle.  In 2002, the map performed the transformation of the data and the resultant data was put through an AIC. The map also inserts the XML data before the transformation to a CLOB field in Oracle. The AIC peforms scrubbing on the transformed data and finally inserts the scrubbed to a CLOB field in another Oracle table. I couldn't get the project migrated to R2. So I manually created the map. The vbscript routines in the map to insert xml data to oracle was replaced by EXSLT. So I have an extension file that calls an assembly to insert the xml data to the clob field. The AIC in 2002 was replaced by a custom send pipeline component. I also need some envelope info in the map to perform certain logic. So I modified the enrichment sample to merge the envelope data with the claims data. I have a slight issue with the enrichment sample. The send port in the sample that is subscribed to the orchestration writes the xml data with the envelope info to the file system. I tried to modify the send port so that I can apply the transformation and finaly call my custom pipeline. For some reason the transformation is not working. There is a section on the send port to specify an outbound map. But that is not working. The xml writtedn to the file system is the xml data from the orchestration. I created a new receive port to pickup the xml data from the filesystem. Also specified the map on the receive port. Created a new send port that is subscribed to the new receive port and specified my custom pipeline there. this setup is working. The drawback is that 10K messages have to go through Biztalk twice. I am pretty sure this can have a -ve impact on performance. Also I noticed that the SQL Messagebox DB is growing in size. Every time I process a large file, it is consuming 1 GB from the file system. I had around 3 GB free space before I processed this big file. Now the free space is 600 MB. I processed this file twice. Really really appreciate for taking time to read this. Please let me know if there is anything that needs to be changed as far as my setup is concerned.

    Meanwhile I will strat with a simple configuration. I will setup a passthroutransmitt and see how long it takes to write the XMLs to the file system.

    Friday, February 20, 2009 1:39 PM
  • Firstly, the design of 2002, 2004 was having an additional layer of pre-processing before msgs were processed to be sent to Msg Box. This additional layer inadvertantly acted as a throttle to make msgs available during the processing of the large msgs with large number of split points. In R2, however, the layer doesnt exist and hence the intermediate step is not there causing perceivable slow-down. As Gerard mentions if you could try out the same config with 2002/2004 and R with the hotfix on the same doc, you may see the improvement(if not please let us know)

    In your case, the fact that you have to run through BizTalk twice must be the huge overhead. I would suggest that you get the Msg Enirchment + Send port based mapper to work. Microsoft Support(PSS) may be of help to you. I dont see why an orch(like msg enrichment) should affect the mapping in the send port.

    You could also use a Context property accessor functoid at the following location and instead of going through the Msg enrichment orch, directly use it with Send port mapper and get done with it. 
    Once you access the context pros with this context accesor functoid, you can add them to the destination XML and that would contain context props needed.


    Monday, February 23, 2009 10:58 AM
  • Thanks Ravi for the useful info. I did a simple test on R2. I put the 10K file through the R2 environment configured as a PassThruTransmitt. Took 20 mins to write out 10,034 xml docs to the file system. The put the same file through the Enrichment environment. Took 39 mins to write out 10,034 xmls with the envelope info to the file system. The enrichment process took all most double the amount of time. I will try out the Context Accessor functoid.

    Since my environment is not working properly, the only comparison I can do now is the passthrutransmitt performance. I can config 2002 for PassThruTransmitt and see how long it takes to write the xmls.


    Monday, February 23, 2009 7:50 PM
  • I was able to access to the envelope info using the context accessor functoid. That really helped. Still trying to figure out the issue I had extending the enrichment sample to use a map and a custom pipeline on the send port.

    In order to make sue of the context accessor functoid, the map needs to be configured on the receive port. I have no issues with that. The map inserts the xml version of the edi document along with some key info to an oracle table (IN table) and returns a key. this key gets mapped to a field in the output document and gets inserted to another oracle table (OUT table) along with the transformed data which is in a positional flat file format. Now when I process a claim file, the IN table gets populated first since the mapping happens in the receive port. I could see records getting inserted to the IN table withn 4-5 sec after I drop the 837P file with multiple STs to the pickup directory. All the inbound claim records will be in the IN table by the time BizTalk completes the parsing/splitting. When the parsing is done, all the messages are in the messagebox and then only it becomes subscribable. Once the messages becomes subscribable, the send port receives the messages and fires the custom pipeline which inserts the transformed data to the OUT oracle table. So in other words my OUT table gets loaded only after the IN table is fully loaded. That wasn't the case with 2002. My IN and OUT tables used to get loaded at the same time. The records were inserted under the context of a distributed transaction. The parsing behaviour was also different in 2002.

    That brings me back to my original post in this thread reagarding the parsing behaviour in R2. I see that there are three options to choose from the EDI Party properties for the Inbound Batching.

    1) Split interchange as Transaction sets
    2) Preserve Interchange - Suspend Transaction set on error
    3) Preserve Interchange - Suspend Interchange on error

    If I choose the third option I agree with the parsing behaviour of R2. Messages won't be subscribable untill the full interchange is parsed. But on the other hand if I choose option 1 or 2, why can't I subscribe to the messages that are already parsed and available in the message box? Is there a setting that I need to enable or set in order to start procssing the messages as soon as they become available in the message box? Waiting for the file to be parsed completely is just a waste of processing time for option 1) or 2)

    In my case it took 20 mins to process an 837P file with 10,000 claims through a PassThruTransmit and my EDI Party is configured to use the "Split Interchanges as Transaction Sets" option. Out of 20 mins, it took 10 mins for completly Parsing the file and 10 mins for processing. On the other hand if there was an option to process the messages in the messagesbox before R2 fully completes parsing, it would not have taken 20 mins for the process. Parsing and processing would have happened in parallel and we could have significantly cut down the processing time. Next I put the same file through the configuration that uses the context accessor and map in the receive port.  The whole process took 39 mins. 20 mins for parsing and loading the records to the IN table and another 10 mins for processing them through a passthrutransmit. In this case I had to wait 20 mins to start processing the messages even though they were available in the message box. So the host instance configured to handle the send operations was idle for 20 mins. That wasn't the case in 2002. I put the same file through 2002 and it processed 10,000 claims in 33 mins. 6 mins faster than R2 even though R2 runs on a dual processor machine with 4GB ram. 2002 runs on a P3 1.6GHz machine with 500MB ram. I couldn't believe 2002 outperformed R2. If R2 had the option to enebale the parsing behaviour in 2002 based on your Batching Processing option, the outcome would have been the exact opposite.

    I really love the nice features in R2 that were not available in 2002. But R2 certainly lacks the parsing behaviour in 2002 and that is a BIG factor when it comes to scalabilty. I sincerely hope this behaviour can be achived in R2 using some config setting and that's what I am praying for. I am also analysing the performance counters captured by PerfMon to get a better understanding of the situation.

    Thank you for taking time to read this long thread and really appreciate any help on the parsing behaviour and any tips to improve the performance.


    Thursday, February 26, 2009 5:11 AM
  • gjoseph,

     First of all, I appreciate the crystal clear detail that you have provided in this thread! I'm working on a migration to R2 project and our users are excited to goto R2 however, won't be that thrilled to hear it's twice as slow as and old version of BizTalk :( 

     So, it sounds like the KB967945 hotfix did not resolve the debatching performance issue you have identified? Did the hotfix do anything for you? Was there a final resolution?


    • Proposed as answer by Nidhogg Dragon Wednesday, March 18, 2009 2:31 PM
    Wednesday, March 18, 2009 11:51 AM
  • Hi Mark (and all readers),

    I think the best resolution here is to first completly understand the product and new features that R2 offers, as far as EDI\HIPAA EDI is related. There are a lot of features, and some of those features can involve a lot of internal workings, data collection, storage, and so on. Once you understand what's in the toolbox and under the covers, you'll be more able to design R2 integration solutions that fit the need of your business processes, assuming R2 is the right technical choice. Remember, more often than not, there is no silver bullet solution.

    What your doing now, experimenting, troubleshooting, brain storming, is a great way to jump right into the product and feel the pain of learning and understanding a new product and it's capabilities. Good luck. As always, we'll share our insights and discoveries as well.

    Thanks -

    • Proposed as answer by Nidhogg Dragon Wednesday, March 18, 2009 2:45 PM
    • Edited by Nidhogg Dragon Wednesday, March 18, 2009 2:50 PM mispelling
    Wednesday, March 18, 2009 2:45 PM
  • That's really interesting to compare performance between BizTalk 2002 with 2006 R2.

    You mentioned you had some trouble getting the outbound map to work on the send port. You have to use the XmlTransmit pipeline in order for the outbound map to work. I did not see you mention this so I thought I would make the suggestion.

    If this answers your question, please use the "Answer" button to say so | Ben Cline
    Wednesday, March 18, 2009 4:19 PM
  • Mark,

    The hotfix did improve the performance. Before the hotfix it took around 5-6 hours to process the 10,000 claim file. After the hotfix it took 39 mins. So defenitely there is a significant improvement in terms of performance. But I am not happy with 39 mins. We mostly deal with large files and I am concerned about throughput when it comes to large files. Also the R2 parsing behavior may have a role in this. Based on the current architecture, the messages in the messagebox becomes subscribable only when R2 compelets the parsing of the full file. When I tested this with a 60,000 claim file, it took around 40 mins to parse the full file. At that point the messgaebox is flooded with 60,000 messages and the SQL Server memory utilization is around 1.7GB. There may also be an overhead when it comes to this many messgaes in the messagebox.

    I setup a simple PassThruTransmit environment for testing purposes. No maps, no custom pipelines... So I am basically writing the xml version of the claim documents to the file system using the file adapter on the send port that uses a passthru pipeline. After monitoring the performance counters, it is obvious that R2 is throttling and I am pretty sure that the reason for the poor performance. I have seperate hosts for the Receive and Send operations. The send host is throttling. High message delivery rate counter is mostly 1 (High) and also Message delivery throttling state counter indicates a value that suggests the incoming rate excceds the outgoing rate. Based on my understanding of the throttling settings, I adjusted some settings. But that didn't prevent throttling and I didn't see any performance gain. So I am currently working with the Support Team on fine tuning the throttling settings. I will let you know how it goes.


    I will defenitely check that out. As always, thanks for the help.

    Thursday, March 19, 2009 4:54 AM