locked
Merge large XMLs RRS feed

  • Question

  • Hi,

    I am looking for a method to merge large xmls. My orchegstration get xml messages from  Database and map them to target file (eg X12) in a loop. I have to merge all target messages into one. Each message will be large so I dont want o use xmldocument method. I am trying to use .NET class with xmltexwriter and xmltextreader for this. However not sure how insert reader.ReadSubtree() into middle of xmltextwriter.
    For example, if i get target messages as X12-xml then I need to get the first xml and merge the rest of xml's LXLoop node with frist one.
    So I put the first xml into xmltextwriter. I can access the LXLoop using reader.ReadSubtree in the rest of xmls but not sure how to add to bottom of LXLoop in the first xml that i have it in xmltexwriter.

    Thank you for your help in advance,

    Siv

    Wednesday, March 16, 2011 3:25 PM

Answers

  • You are correct that using XmlDocument and XmlNode will require a lot of memory. You will also need to maintain the XmlDocument across multiple persistence points. I have recently discovered a limitation in the size of orchestration variable that can be persisted. Messages are fragmented when persisted but it would appear that variables are not. An alternative is to use a filestream or virtual stream but the problem here is persisting these during the lifetime of the orchestration.

     I created a serializable file stream that will allow you to write your Xml into a file and save the file current state when dehydrated and recovers that file state when rehydrated.

    using System;
    using System.IO;
    using System.Runtime.Serialization;
    
    namespace TKH.BizTalk.XLANGs
    {
      [Serializable]
      public class SerializableFileStream : FileStream, ISerializable
      {
        public bool DeleteOnClose { get; set; }
        
        public SerializableFileStream(string filename, FileMode mode, FileAccess access)
          : base(filename, mode, access)
        {
          this.DeleteOnClose = false;
        }
    
        protected SerializableFileStream(SerializationInfo info, StreamingContext context)
          : base(info.GetString("Filename"), FileMode.Open, (FileAccess)info.GetValue("Access", typeof(FileAccess)))
        {
          base.Position = (long)info.GetUInt64("Position");
          this.DeleteOnClose = info.GetBoolean("DeleteOnClose");
        }
        
        public void GetObjectData(SerializationInfo info, StreamingContext context)
        {
          //Save FileStream state
          info.AddValue("Filename", base.Name);            
          info.AddValue("Access", (FileAccess)((base.CanRead ? (int)FileAccess.Read : 0) | (base.CanWrite ? (int)FileAccess.Write : 0)));
          info.AddValue("Position", base.Position);
          info.AddValue("DeleteOnClose", this.DeleteOnClose);
          base.Close();
        }
    
        protected override void Dispose(bool disposing)
        {
          base.Dispose(disposing);
          if (DeleteOnClose)
          {
            File.Delete(base.Name);
          }
        }
      }
    }
    

    You can create an aggregator class which contains this serializable file stream: here is some pseudo code:

    public class Aggregator
    {    
      public string X12Wrapper {get; set;}
      public SerializableFileStream AggregatedStream {get; set;}
      
      
      public void AddX12Header(XLANGMessage message)
      {
      	//Save X12 header in X12Wrapper
      	//add the intial part of message to AggregatedStream
      	//using XmlTextWriter
      	
      }
      public void AddLXLoop(XLANGMessage message)
      {
      	//recursively called to add LXLoop fragments
      	//to AggregatedStream using XmTextWriter
      }
      
      public void GetAggregatedMessage(XLANGMessage outMessage)
    	{  
    		//Complete the the Xml in AggregatedStream using X12Wrapper
    		if (AggregatedStream != null)
    		{
    			AggregatedStream.Seek(0, SeekOrigin.Begin);
    			outMessage[0].LoadFrom(new StreamFactory(AggregatedStream));
    			AggregatedStream.DeleteOnClose = true;
    		}
    
    	}
    }
    
    public class StreamFactory : IStreamFactory
    {
    	Stream stream;
    
    	public StreamFactory(Stream stream)
    	{
    		if (null == stream)
    			throw new ArgumentException();
    
    		this.stream = stream;
    	}
    
    	Stream IStreamFactory.CreateStream()
    	{
    		return stream;
    	}
    }
    

     

    Thursday, March 17, 2011 9:42 PM
    Answerer

All replies

  • Hi

    If you requirment is just to map the Database values to the X12 file. I would suggest you to use the map over the ports.

    > Create a Request Response WCF SQL port.

    > Create a map with looping functiod to map the SQL result set to X12 file.

    > Configure the map on the respone port to execute.

    This way you should be able to map the large result set without hitting much of the performance.

    If you situation is not a message only scenario. Please share further details like size of the result set you are expecting and how do you want to map these result set.

     


    Nandri,
    Sathish
    Wednesday, March 16, 2011 7:47 PM
  • How big are these messages i.e. what is large MB, GB?
    What is the expected size of the output message?
    How many messages will you be merging?

    Thursday, March 17, 2011 2:54 AM
    Answerer
  • If you are doing this outside BizTalk in a .NET component, you can give XLINQ a try. It would be easier.
    Please mark as answer if this helps you. Thanks and warm regards Ambar Ray Solution Architect - Microsoft Technologies
    Thursday, March 17, 2011 3:26 AM
  • If you are doing this outside BizTalk in a .NET component, you can give XLINQ a try. It would be easier.
    Please mark as answer if this helps you. Thanks and warm regards Ambar Ray Solution Architect - Microsoft Technologies

    But be careful if you use Linq for XML. Particular methods load the XML into memory and this will bloat your memory.

    I would try solving the issue using BizTalk mapper with multiple inputs.

    Rgds Mrks

    Thursday, March 17, 2011 8:40 AM
  • Hi guys,

    Thank you for helping on this to all of you.

    I want to ensure that I dont use memory that much. Because we are getting system out of memory issue with current process. We get over 6000 FLAT** files and some files are over 300MB. To avoid system out of memory issue, we use debatching method.
    So we split the big file into small in custom pipe and store them into database as xml string.
    my orchestration, inside a loop, get the xml string, those belong to same input file, from database  and map to target files. I already handle this piece.
    Now I have to merge all those target outputs to single output. I am trying to merge them using virtualstream/stream.
    I use the xmlreader to read each input. How can I use xmlwriter or some other method to merge those files?
    If just add to bottom of the input then we can do it easly in xmlwriter. But I need to add all the LXLoop together in one file. So take the first file and add LXLoops of other files to current file LXLoop. I can do this easly if i use XMLDocument and XMLNode. But I dont want to use them since they consume alot of memory.

    Is there any other way to merge nodes?

    Siv

    Thursday, March 17, 2011 2:15 PM
  • You are correct that using XmlDocument and XmlNode will require a lot of memory. You will also need to maintain the XmlDocument across multiple persistence points. I have recently discovered a limitation in the size of orchestration variable that can be persisted. Messages are fragmented when persisted but it would appear that variables are not. An alternative is to use a filestream or virtual stream but the problem here is persisting these during the lifetime of the orchestration.

     I created a serializable file stream that will allow you to write your Xml into a file and save the file current state when dehydrated and recovers that file state when rehydrated.

    using System;
    using System.IO;
    using System.Runtime.Serialization;
    
    namespace TKH.BizTalk.XLANGs
    {
      [Serializable]
      public class SerializableFileStream : FileStream, ISerializable
      {
        public bool DeleteOnClose { get; set; }
        
        public SerializableFileStream(string filename, FileMode mode, FileAccess access)
          : base(filename, mode, access)
        {
          this.DeleteOnClose = false;
        }
    
        protected SerializableFileStream(SerializationInfo info, StreamingContext context)
          : base(info.GetString("Filename"), FileMode.Open, (FileAccess)info.GetValue("Access", typeof(FileAccess)))
        {
          base.Position = (long)info.GetUInt64("Position");
          this.DeleteOnClose = info.GetBoolean("DeleteOnClose");
        }
        
        public void GetObjectData(SerializationInfo info, StreamingContext context)
        {
          //Save FileStream state
          info.AddValue("Filename", base.Name);            
          info.AddValue("Access", (FileAccess)((base.CanRead ? (int)FileAccess.Read : 0) | (base.CanWrite ? (int)FileAccess.Write : 0)));
          info.AddValue("Position", base.Position);
          info.AddValue("DeleteOnClose", this.DeleteOnClose);
          base.Close();
        }
    
        protected override void Dispose(bool disposing)
        {
          base.Dispose(disposing);
          if (DeleteOnClose)
          {
            File.Delete(base.Name);
          }
        }
      }
    }
    

    You can create an aggregator class which contains this serializable file stream: here is some pseudo code:

    public class Aggregator
    {    
      public string X12Wrapper {get; set;}
      public SerializableFileStream AggregatedStream {get; set;}
      
      
      public void AddX12Header(XLANGMessage message)
      {
      	//Save X12 header in X12Wrapper
      	//add the intial part of message to AggregatedStream
      	//using XmlTextWriter
      	
      }
      public void AddLXLoop(XLANGMessage message)
      {
      	//recursively called to add LXLoop fragments
      	//to AggregatedStream using XmTextWriter
      }
      
      public void GetAggregatedMessage(XLANGMessage outMessage)
    	{  
    		//Complete the the Xml in AggregatedStream using X12Wrapper
    		if (AggregatedStream != null)
    		{
    			AggregatedStream.Seek(0, SeekOrigin.Begin);
    			outMessage[0].LoadFrom(new StreamFactory(AggregatedStream));
    			AggregatedStream.DeleteOnClose = true;
    		}
    
    	}
    }
    
    public class StreamFactory : IStreamFactory
    {
    	Stream stream;
    
    	public StreamFactory(Stream stream)
    	{
    		if (null == stream)
    			throw new ArgumentException();
    
    		this.stream = stream;
    	}
    
    	Stream IStreamFactory.CreateStream()
    	{
    		return stream;
    	}
    }
    

     

    Thursday, March 17, 2011 9:42 PM
    Answerer