none
Custom Pipeline Component - Unzip using the new .Net 4.5 ZipArchive class RRS feed

  • Question

  • Hello,

    I am trying to build a custom pipeline component to unzip an archive (containing about 1000 of non Xml files).

    I have found this article : http://social.technet.microsoft.com/wiki/contents/articles/20640.biztalk-server-2010-processing-zip-message-having-multiple-type-files.aspx

    I used it to start my component, but I decided to update it with the "new" Microsoft .Net 4.5 Compression ZipArchive class, instead of using the Ionic library.

    Unfortunately I get the following error: "End of Central Directory record could not be found".

    It seems to crash on the ZipArchive creation.

    Is there any restriction with the ZipArchive class?

    Or am i doing something wrong?

    Thanks for you help

    Here is the code of my Disassemble method:

            public void Disassemble(Microsoft.BizTalk.Component.Interop.IPipelineContext pPC, Microsoft.BizTalk.Message.Interop.IBaseMessage pInmsg)
            {
                IBaseMessagePart msgBodyPart = pInmsg.BodyPart;
                if (msgBodyPart != null)
                {
                    Stream msgBodyPartStream = msgBodyPart.GetOriginalDataStream();
                    if (msgBodyPartStream != null)
                    {
                        using (ZipArchive zipInput = new ZipArchive(msgBodyPartStream, ZipArchiveMode.Read))
                        {
                            
                            foreach (ZipArchiveEntry zipEntry in zipInput.Entries)
                            {
    
    
                                using (Stream zipEntryStream = zipEntry.Open())
                                { 
                                    MemoryStream memStream = new MemoryStream();
                                    byte[] buffer = new Byte[1024];
    
                                    int bytesRead = 1024;
                                    while (bytesRead != 0)
                                    {
                                        bytesRead = zipEntryStream.Read(buffer, 0, buffer.Length);
                                        memStream.Write(buffer, 0, bytesRead);
                                    }
    
                                    //Creating outMessage
                                    IBaseMessage outMessage;
                                    outMessage = pPC.GetMessageFactory().CreateMessage();
                                    outMessage.AddPart("Body", pPC.GetMessageFactory().CreateMessagePart(), true);
                                    memStream.Position = 0;
                                    outMessage.BodyPart.Data = memStream;
    
                                    //Creating custom context property to hold extension of file 
                                    string extension = string.Empty;
                                    extension = zipEntry.Name.Substring(zipEntry.Name.IndexOf("."));
    
                                    //Promoting the custom property
                                    pInmsg.Context.Promote("Extension", "https://GNB.DPS.Integration.NB911ShapeFiles.UnzipShapeFilesDisassembler", extension);
    
                                    outMessage.Context = PipelineUtil.CloneMessageContext(pInmsg.Context);
    
                                    //Add outMessage to queue
                                    _msgs.Enqueue(outMessage);
                                }
                            }
                        }
                    }
                }
           }

    Sunday, March 6, 2016 12:33 AM

Answers

  • I was actually suggesting those as one other, as in mutually exclusive.

    So, if wrapping the incoming stream in a ReadOnlySeekableStream works on it's own, just do that.  If not, then copy the content to a VirtualStream without using the ReadOnlySeekableStream.

    Either way, you shouldn't be using the abstract Stream class or a MemoryStream for any of the original content.

    Now, for the output of the .zip, use a VirtualStream and do not construct it with a using statement.  When it goes out of scope, it's up for garbage collection.

    • Proposed as answer by Angie Xu Sunday, March 13, 2016 8:18 AM
    • Marked as answer by Angie Xu Thursday, March 17, 2016 2:29 AM
    Wednesday, March 9, 2016 7:42 PM
    Moderator

All replies

  • Do you know exactly what line it's breaking?

    You can also try debugging in a winfor app first to get the mechanics right, then integrate that code into a pipeline component.

    Monday, March 7, 2016 4:22 PM
    Moderator
  • Do you know exactly what line it's breaking?

    You can also try debugging in a winfor app first to get the mechanics right, then integrate that code into a pipeline component.

    Hi,

    I've done some debbugging and it seems to crash on this line :

    using (ZipArchive zipInput = new ZipArchive(msgBodyPartStream, ZipArchiveMode.Read))
                        {

    Tuesday, March 8, 2016 3:05 PM
  • My first guess is some minor incompatibility in the steam stack between BizTalk and ZipArchive.  It happens.

    Two things to try.

    1. Wrap the original stream in an instance of ReadOnlySeekableStream.  https://msdn.microsoft.com/en-us/library/microsoft.biztalk.streaming.readonlyseekablestream.aspx?f=255&MSPPError=-2147217396

    2. Copy the original stream into a VirtualStream. https://msdn.microsoft.com/en-us/library/microsoft.biztalk.streaming.virtualstream.aspx

    Tuesday, March 8, 2016 3:26 PM
    Moderator
  • My first guess is some minor incompatibility in the steam stack between BizTalk and ZipArchive.  It happens.

    Two things to try.

    1. Wrap the original stream in an instance of ReadOnlySeekableStream.  https://msdn.microsoft.com/en-us/library/microsoft.biztalk.streaming.readonlyseekablestream.aspx?f=255&MSPPError=-2147217396

    2. Copy the original stream into a VirtualStream. https://msdn.microsoft.com/en-us/library/microsoft.biztalk.streaming.virtualstream.aspx

    Thanks John, I will try that.

    In the meantime, I tried something different that works:

    My requirements was also to archive the Zip on disk before unzipping it. So I created the file on disk from the BizTalk body part stream, and then I created my ZipArchive from the file on disk. It works that way but I'd rather have the archiving process and the unzipping process to be independant.

    As you said, there is probably an incompatibility between BizTalk and the ZipArchive ...

    public void Disassemble(Microsoft.BizTalk.Component.Interop.IPipelineContext pPC, Microsoft.BizTalk.Message.Interop.IBaseMessage pInmsg)
    {
    		string zipSourceFileName = pInmsg.Context.Read("ReceivedFileName", "http://schemas.microsoft.com/BizTalk/2003/file-properties").ToString();
    		zipSourceFileName = zipSourceFileName.Substring(zipSourceFileName.LastIndexOf("\\") + 1);
    		string fullzipFilePath = this._ZipArchiveFolder;
    		if (this._ZipArchiveFolder.EndsWith("\\"))
    			fullzipFilePath += zipSourceFileName;
    		else fullzipFilePath = fullzipFilePath + "\\" + zipSourceFileName;
    
    		IBaseMessagePart msgBodyPart = pInmsg.BodyPart;
    		if (msgBodyPart != null)
    		{
    			Stream msgBodyPartStream = msgBodyPart.GetOriginalDataStream();
    			if (msgBodyPartStream != null)
    			{
    				using (Stream zip = File.Create(fullzipFilePath))
    				{
    					msgBodyPartStream.Seek(0, SeekOrigin.Begin);
    					msgBodyPartStream.CopyTo(zip);
    				}
    
    				using (ZipArchive zipInput = ZipFile.OpenRead(fullzipFilePath))
    				{
    					foreach (ZipArchiveEntry zipEntry in zipInput.Entries)
    					{
    						String zipEntryFileName = zipEntry.Name;
    
    						using (Stream zipEntryStream = zipEntry.Open())
    						{
    							MemoryStream memStream = new MemoryStream();
    
    							byte[] buffer = new Byte[1024];
    
    							int bytesRead = 1024;
    							while (bytesRead != 0)
    							{
    								bytesRead = zipEntryStream.Read(buffer, 0, buffer.Length);
    								memStream.Write(buffer, 0, bytesRead);
    							}
    
    							//Creating outMessage
    							IBaseMessage outMessage;
    							outMessage = pPC.GetMessageFactory().CreateMessage();
    							outMessage.AddPart("Body", pPC.GetMessageFactory().CreateMessagePart(), true);
    							memStream.Position = 0;
    							outMessage.BodyPart.Data = memStream;
    
    							outMessage.Context = PipelineUtil.CloneMessageContext(pInmsg.Context);
    							outMessage.Context.Write("ReceivedFileName", "http://schemas.microsoft.com/BizTalk/2003/file-properties", zipEntryFileName);
    
    							//Add outMessage to queue
    							_msgs.Enqueue(outMessage);
    						}
    					}
    				}
    			}
    		}
    }

    Tuesday, March 8, 2016 3:41 PM
  • My first guess is some minor incompatibility in the steam stack between BizTalk and ZipArchive.  It happens.

    Two things to try.

    1. Wrap the original stream in an instance of ReadOnlySeekableStream.  https://msdn.microsoft.com/en-us/library/microsoft.biztalk.streaming.readonlyseekablestream.aspx?f=255&MSPPError=-2147217396

    2. Copy the original stream into a VirtualStream. https://msdn.microsoft.com/en-us/library/microsoft.biztalk.streaming.virtualstream.aspx

    Hi John,

    I just tried using the ReadOnlySeekableStream and the VirtualStream, and it works.

    But I have one more question, should I encapsulate all the streams I am using in ReadOnlySeekableStream/VirtualStream, or only the original BizTalk message stream?

    Like:
    using (Stream zipEntryStream = zipEntry.Open())
    or
    MemoryStream memStream = new MemoryStream();

    Thanks !
    Wednesday, March 9, 2016 7:12 PM
  • I was actually suggesting those as one other, as in mutually exclusive.

    So, if wrapping the incoming stream in a ReadOnlySeekableStream works on it's own, just do that.  If not, then copy the content to a VirtualStream without using the ReadOnlySeekableStream.

    Either way, you shouldn't be using the abstract Stream class or a MemoryStream for any of the original content.

    Now, for the output of the .zip, use a VirtualStream and do not construct it with a using statement.  When it goes out of scope, it's up for garbage collection.

    • Proposed as answer by Angie Xu Sunday, March 13, 2016 8:18 AM
    • Marked as answer by Angie Xu Thursday, March 17, 2016 2:29 AM
    Wednesday, March 9, 2016 7:42 PM
    Moderator
  • Hi sorry for the delay.

    You will find below the final source code of the component for those who might need it. It's probably still not perfect as I'm still sing the Stream class at some places.

    There one thing that bothers me though:

    When I loop through the zip archive, I retrieve each file in a variable (zipEntry), which I open in a Stream. Why is it necessary to read the zipEntry stream and write it into a new VirtualStream?

    Why can't I just create a new message directly from the zipEntry stream, like "outMessage.BodyPart.Data = zipEntryStream"?

    Thanks :)

            public void Disassemble(Microsoft.BizTalk.Component.Interop.IPipelineContext pPC, Microsoft.BizTalk.Message.Interop.IBaseMessage pInmsg)
            {
                    string zipSourceFileName = pInmsg.Context.Read("ReceivedFileName", "http://schemas.microsoft.com/BizTalk/2003/file-properties").ToString();
    
                    IBaseMessagePart msgBodyPart = pInmsg.BodyPart;
                    if (msgBodyPart != null)
                    {
                        int bufferSize = 0x280;
                        ReadOnlySeekableStream seekStream = new ReadOnlySeekableStream(msgBodyPart.GetOriginalDataStream(), bufferSize);
    
                        if (seekStream != null)
                        {
                            if (this._ArchiveZipFromPipeline)
                            {                            
                                zipSourceFileName = zipSourceFileName.Substring(zipSourceFileName.LastIndexOf("\\") + 1);
                                string fullzipFilePath = this._ZipArchiveFolder.TrimEnd('\\');
                                fullzipFilePath = fullzipFilePath + "\\" + zipSourceFileName;
    
                                using (Stream zip = File.Create(fullzipFilePath))
                                {
                                    seekStream.Seek(0, SeekOrigin.Begin);
                                    seekStream.CopyTo(zip);
                                }                            
                            }
    
                            seekStream.Position = 0;
                            seekStream.Seek(0, SeekOrigin.Begin);
                            ZipArchive zipInput = new ZipArchive(seekStream, ZipArchiveMode.Read);
                            
                            foreach (ZipArchiveEntry zipEntry in zipInput.Entries)
                            {
                                string zipEntryFileName = zipEntry.Name;
                                string extension = zipEntryFileName.Substring(zipEntry.Name.IndexOf("."));
    
                                // Open the current sip entry as a stream
                                Stream zipEntryStream = zipEntry.Open();
    
                                VirtualStream vStream = new VirtualStream();
                                byte[] buffer = new Byte[1024];
                                int bytesRead = 1024;
                                while (bytesRead != 0)
                                {
                                    bytesRead = zipEntryStream.Read(buffer, 0, buffer.Length);
                                    vStream.Write(buffer, 0, bytesRead);
                                }
    
                                //Creating outMessage
                                IBaseMessage outMessage;
                                outMessage = pPC.GetMessageFactory().CreateMessage();
                                outMessage.AddPart("Body", pPC.GetMessageFactory().CreateMessagePart(), true);
                                vStream.Position = 0;
                                outMessage.BodyPart.Data = vStream;
    
                                ////Creating custom context property to hold extension of file 
                                pInmsg.Context.Promote("Extension", "https://GNB.DPS.Integration.NB911ShapeFiles.Pipelines.PropertySchema", extension);
    
                                outMessage.Context = PipelineUtil.CloneMessageContext(pInmsg.Context);
                                outMessage.Context.Write("ReceivedFileName", "http://schemas.microsoft.com/BizTalk/2003/file-properties", zipEntryFileName);
    
                                //Add outMessage to queue
                                _msgs.Enqueue(outMessage);
    
                            }
    
                            if (this._SendZipToSendPort)
                            {
                                //Creating a last message being the full Zip file to archive it on disk (Send Port + Filter on extension)
                                IBaseMessage outZipArchive;
                                outZipArchive = pPC.GetMessageFactory().CreateMessage();
                                outZipArchive.AddPart("Body", pPC.GetMessageFactory().CreateMessagePart(), true);
                                seekStream.Seek(0, SeekOrigin.Begin);
                                outZipArchive.BodyPart.Data = seekStream;
                                //Creating custom context property to hold extension of file
                                string sourceFileExtension = zipSourceFileName.Substring(zipSourceFileName.IndexOf("."));
                                pInmsg.Context.Promote("Extension", "https://GNB.DPS.Integration.NB911ShapeFiles.Pipelines.PropertySchema", sourceFileExtension);
    
                                outZipArchive.Context = PipelineUtil.CloneMessageContext(pInmsg.Context);
    
                                //Add outMessage to queue
                                _msgs.Enqueue(outZipArchive);
                            }
                        }
                    }
            }

    Monday, March 21, 2016 3:14 PM
  • Why can't I just create a new message directly from the zipEntry stream, like "outMessage.BodyPart.Data = zipEntryStream"?

    Because an archive such as Zip or RAR or many others are inherently and by design, multi-part so the .zip is multiple streams.  The fact that many .zips have only a single part/file/stream is incidental, they're just groups of 1.

    So, you have to create multiple Streams because the zip contains multiple streams.  Consider, would just mashing a .xlsx and .docx together in one file work?  Nope.  They're two separate files.

    Tip: Instead of looping to copy the bytes, you probably can, and should, use the Stream.CopyTo() method.  See here:  https://msdn.microsoft.com/en-us/library/system.io.stream.copyto%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396

    Monday, March 21, 2016 7:03 PM
    Moderator
  • That's what I meant.

    I wasn't trying to assign the whole Zip to a message, I was talking about the zipEntry in the loop, so a single file within the archive and wanted to avoid to reading/writing by byte blocks.

    I think I tried the CopyTo() and didn't get it to work.

    But at least if I know it's possible, I can do some more tests when i will have the time.

    Monday, March 21, 2016 8:30 PM