none
Problems splitting large XML message (+35MB) RRS feed

  • Question

  • Hi all,

    I have a large XML-message (+35MB) containing invoice-data from a week worth of orders... Now I need to send this invoice back to the customer, but splitted based on their order-number (their system cannot process an invoice that contains multiple ordernumbers)

    The (simplified) input-file looks something like this:

    <Invoice>
    <Header></Header>
    <Line><Ordernumber>A</Ordernumber><Linenumber>1</Linenumber></Line>
    <Line><Ordernumber>A</Ordernumber><Linenumber>2</Linenumber></Line>
    <Line><Ordernumber>B</Ordernumber><Linenumber>1</Linenumber></Line>
    <Line><Ordernumber>B</Ordernumber><Linenumber>2</Linenumber></Line>
    <Line><Ordernumber>A</Ordernumber><Linenumber>3</Linenumber></Line>

    ... x 10.000 line items
    </Invoice>

    What I do now is in an orchestration I get a list of distinct Ordernumber values, I then use a map inside a looping shape to loop over the source message and only output the lines that match the curren Ordernumber in the loop.

    This method always worked for smaller messages, but with this customer the invoices reach 35MB of data, and my BizTalk service uses up all the resources of my server (8GB of memory in a few minutes)... 

    Is there a better way to approach this problem?
    I don't think I can use the XML disassembler with en envelope schema, because I don't split on structure, but based on actual values in the lines.

    Thursday, March 27, 2014 10:16 AM

Answers

  • You can do transformations in SSIS. SSIS is for ETL  which stands for Extract, Transform and Load.

    SSIS-Integration Services Transformations

    Sorting/grouping is certainly not going to solve your problem completely, but it would improve the performance to an good extend. You can't expect 35 MB file processing going to be smooth in BizTalk. If your decision is to use BizTalk, then you can find ways to improve the processing in BizTalk. But still got to live with the other issue as trade-off of using BizTalk.

    With mapping using  Muenchian method, you can map across to another schema which would group order numbers in a different structure so that you can do envelope schema. i.e You can map the schema to structure similar to the following after Muenchian method:

    <Invoice>
     <Header/>
     <OrderNumberGroup>
      <Line><Ordernumber>A</Ordernumber><Linenumber>1</Linenumber></Line>
      <Line><Ordernumber>A</Ordernumber><Linenumber>2</Linenumber></Line>
      <Line><Ordernumber>A</Ordernumber><Linenumber>3</Linenumber></Line>
     </OrderNumberGroup>
     <OrderNumberGroup>
      <Line><Ordernumber>B</Ordernumber><Linenumber>1</Linenumber></Line>
      <Line><Ordernumber>B</Ordernumber><Linenumber>2</Linenumber></Line>
     </OrderNumberGroup>
     <OrderNumberGroup>
         <Line><Ordernumber>C</Ordernumber><Linenumber>1</Linenumber></Line>
      <Line><Ordernumber>C</Ordernumber><Linenumber>2</Linenumber></Line>
     </OrderNumberGroup>
    </Invoice>

    Now you can you can use envelope sheama to split the message at <OrderNumberGroup> record level. You can even call the Receive pipeline in Orchestration to split the message in orchestration. or debatch at port level since now you have the luxary of using the envelope schema debatching.

    As mentioned in my earlier comment, you can even consider using the standard .NET process which would debatch the message and just use BizTalk for transmitting the debatched files.

    You have some options now like

    • using SSIS you can even do transformation.
    • or using Muenchian method to group the msg to different structure and use envelope schema to debatch
    • or using .NET program which would debatch the message and use BizTalk for just transmitting the debatched files.

     

     


    If this answers your question please mark it accordingly. If this post is helpful, please vote as helpful by clicking the upward arrow mark next to my reply.

    • Proposed as answer by Ravindar Thati Thursday, March 27, 2014 11:23 AM
    • Marked as answer by XZcute Thursday, March 27, 2014 12:37 PM
    Thursday, March 27, 2014 11:18 AM

All replies

  • There are few things to consider.

    Think whether you really need to do this in BizTalk. For similar file processing you can consider SSIS which is meant for doing similar requirements.

    If you still preferred to use BizTalk, then consider group the XML based on order number using Muenchian method in your map. Using Muenchian method, you can group the XML in to Order number, then disassemble the grouped message as you currently do. So if you use this method, the above sample XML file would be group as below and this is better than what you're doing now:

    <Invoice>
    < Header></Header>
    < Line><Ordernumber>A</Ordernumber><Linenumber>1</Linenumber></Line>
    < Line><Ordernumber>A</Ordernumber><Linenumber>2</Linenumber></Line>

    <Line><Ordernumber>A</Ordernumber><Linenumber>3</Linenumber></Line>

    <Line><Ordernumber>B</Ordernumber><Linenumber>1</Linenumber></Line>
    < Line><Ordernumber>B</Ordernumber><Linenumber>2</Linenumber></Line>

    <Line><Ordernumber>C</Ordernumber><Linenumber>1</Linenumber></Line>
    < Line><Ordernumber>C</Ordernumber><Linenumber>2</Linenumber></Line>

     The classical Muenchian grouping use generate-id(), however,using generate-id() is slowest. You can use count() function in Muenchian grouping which is good in terms of performance.

    Refer the following link for more help on this:http://social.technet.microsoft.com/wiki/contents/articles/22059.biztalk-server-grouping-and-sorting-operations-inside-biztalk-maps-using-the-muenchian-method.aspx

    If I were you, I would not publish 35+GM file into Messagebox. Use BizTalk only when you need to use Messagebox for these type of requirement, I would prefer SSIS or even standard .NET code to split the file and may be use BizTalk for just transmitting the debatched messages.


    If this answers your question please mark it accordingly. If this post is helpful, please vote as helpful by clicking the upward arrow mark next to my reply.


    Thursday, March 27, 2014 10:26 AM
  • Hi,

    Thanks for your reply!

    I have to use biztalk, because I don't only have to split the message, but I have to transform it as well...

    I can understand that sorting/grouping the lines in the message first would increase performance, but I don't think it will solve my issue of memory filling up, will it? Because I would still have to loop thesame 35MB message +/- 1000 times... 

    Thursday, March 27, 2014 10:54 AM
  • You can do transformations in SSIS. SSIS is for ETL  which stands for Extract, Transform and Load.

    SSIS-Integration Services Transformations

    Sorting/grouping is certainly not going to solve your problem completely, but it would improve the performance to an good extend. You can't expect 35 MB file processing going to be smooth in BizTalk. If your decision is to use BizTalk, then you can find ways to improve the processing in BizTalk. But still got to live with the other issue as trade-off of using BizTalk.

    With mapping using  Muenchian method, you can map across to another schema which would group order numbers in a different structure so that you can do envelope schema. i.e You can map the schema to structure similar to the following after Muenchian method:

    <Invoice>
     <Header/>
     <OrderNumberGroup>
      <Line><Ordernumber>A</Ordernumber><Linenumber>1</Linenumber></Line>
      <Line><Ordernumber>A</Ordernumber><Linenumber>2</Linenumber></Line>
      <Line><Ordernumber>A</Ordernumber><Linenumber>3</Linenumber></Line>
     </OrderNumberGroup>
     <OrderNumberGroup>
      <Line><Ordernumber>B</Ordernumber><Linenumber>1</Linenumber></Line>
      <Line><Ordernumber>B</Ordernumber><Linenumber>2</Linenumber></Line>
     </OrderNumberGroup>
     <OrderNumberGroup>
         <Line><Ordernumber>C</Ordernumber><Linenumber>1</Linenumber></Line>
      <Line><Ordernumber>C</Ordernumber><Linenumber>2</Linenumber></Line>
     </OrderNumberGroup>
    </Invoice>

    Now you can you can use envelope sheama to split the message at <OrderNumberGroup> record level. You can even call the Receive pipeline in Orchestration to split the message in orchestration. or debatch at port level since now you have the luxary of using the envelope schema debatching.

    As mentioned in my earlier comment, you can even consider using the standard .NET process which would debatch the message and just use BizTalk for transmitting the debatched files.

    You have some options now like

    • using SSIS you can even do transformation.
    • or using Muenchian method to group the msg to different structure and use envelope schema to debatch
    • or using .NET program which would debatch the message and use BizTalk for just transmitting the debatched files.

     

     


    If this answers your question please mark it accordingly. If this post is helpful, please vote as helpful by clicking the upward arrow mark next to my reply.

    • Proposed as answer by Ravindar Thati Thursday, March 27, 2014 11:23 AM
    • Marked as answer by XZcute Thursday, March 27, 2014 12:37 PM
    Thursday, March 27, 2014 11:18 AM
  • Thanks for your insights on this matter!

    I'm going to debatch the message using .NET code before posting it to BizTalk

    Thursday, March 27, 2014 12:40 PM