locked
Parsing out the request headers to isolate file content of an overriden MultipartFormDataStreamProvider that handles Large File Uploads RRS feed

  • Question

  • User-1039489837 posted

    Setup: I have a requirement to handle large file uploading from a web client to sql database.  I can't write to the file system so I can't use the out of the box MultipartFormDataStreamProvider and I can't load the files into memory because they are large (over 1gb).  The requirement I have is to take the parts, encrypt them, and store the part in a row in a sql db.  Later another process will take those parts, decrypt them, recombine them and transmit the file to another system that will process it accordingly.

    Issue: I have gotten the data uploaded to SQL but what is being stored is the entire request (including headers) so when I pull the data back and recombine them, they are resulting in corrupt files.  I can get it to work with text files if I parse the buffer directly and remove the headers themselves.  But this methodology fails when non-text files are uploaded (pdf, zip, img, etc).  I've been spinning my wheels on this for a few days  now and I'm just stumped as to what I'm doing wrong.  Any help would be greatly appreciated!

    Current Methodology: My methodology is to override the WebBuffer policy so that .net doesnt buffer the request (thus removing any out of memory issues), overriding the GetStream method of a custom class that inherits MultipartFormDataStreamProvider and returning a custom stream (inherited Stream and overrode the Write method so that it encrypts the buffer and writes the encrypted buffer to sql). 

    The CustomMultipartFormDataStreamProvider is very generalized and just returns a type of CustomSqlStream when headers.ContentDisposition.Filename isnt empty and returns a new MemoryStream otherwise.  The code is:

            public override Stream GetStream(HttpContent parent, HttpContentHeaders headers)
            {
                // If we have a file name then write contents out to custom stream. Otherwise just write to MemoryStream
                if (headers.ContentType != null && headers.ContentDisposition != null && !string.IsNullOrEmpty(headers.ContentDisposition.FileName))
                {
                    // For form data, Content-Disposition header is a requirement
                    ContentDispositionHeaderValue contentDisposition = headers.ContentDisposition;
    
                    var identifier = Guid.NewGuid().ToString();
                    var fileName = contentDisposition.FileName;// GetLocalFileName(headers);
    
    
                    var boundaryObj = parent.Headers.ContentType.Parameters.SingleOrDefault(a => a.Name == "boundary");
    
                    var boundary = (boundaryObj != null) ? boundaryObj.Value : "";
    
                    if (fileName.Contains("\\"))
                    {
                        fileName = fileName.Substring(fileName.LastIndexOf("\\") + 1).Replace("\"", "");
                    }
    
                    // We won't post process files as form data
                    _isFormData.Add(false);
    
                    var stream = new CustomSqlStream();
                    stream.Filename = fileName;
                    stream.Identifier = identifier;
                    stream.ContentType = headers.ContentType.MediaType;
                    stream.Boundary = (!string.IsNullOrEmpty(boundary)) ? boundary : "";
    
                    return stream;
    
                    throw new InvalidOperationException("Did not find required 'Content-Disposition' header field in MIME multipart body part..");
                }
    
                // We will post process this as form data
                _isFormData.Add(true);
    
                // If no filename parameter was found in the Content-Disposition header then return a memory stream.
                return new MemoryStream();
            }


    The API method called by the client is:

     public Task<HttpResponseMessage> PostFormData()
            {  
                var provider = new CustomMultipartFormDataStreamProvider();
             
                // Read the form data and return an async task.
                var task = Request.Content.ReadAsMultipartAsync(provider).ContinueWith<HttpResponseMessage>(t =>
                {
                    if (t.IsFaulted || t.IsCanceled)
                    {
                        Request.CreateErrorResponse(HttpStatusCode.InternalServerError, t.Exception);
                    }
    
                    return Request.CreateResponse(HttpStatusCode.OK);
                });
    
                return task;
            }

    My current CustomSqlStream.Write method is (this.Boundary is assigned to the derived Stream class when it's created.  It's pulled from the headers of the multipart request):

     public override void Write(byte[] buffer, int offset, int count)
            {
                string formData = Encoding.UTF8.GetString(buffer);
                bool trimmed = false;
                
                //check for boundary
                if (formData.Contains(this.Boundary)) {
                    var endPattern = String.Format("{0}{1}{2}", "\r\n--", this.Boundary, "--\r\n");
    
                    if (formData.Contains(endPattern))
                    {
                        //this is a end of boundary occurrence 
                        formData = formData.Substring(0, formData.IndexOf(endPattern));
                        trimmed = true;
                    }
    
                    if (formData.Contains(this.Boundary))
                    {   
                        //this is a header data occurrence
                        var boundaryOffset = formData.IndexOf(this.ContentType) + this.ContentType.Length + "\r\n\r\n".Length;
    
                        formData = formData.Substring(boundaryOffset);
    
                        trimmed = true;
                    }
                }
    
                if (trimmed)
                {
                    byte[] body = new byte[formData.Length];
    
                    buffer = Encoding.UTF8.GetBytes(formData);
    
                    for (int j = 0, k = 0; j < formData.Length; j++, k++)
                    {
                        body[k] = buffer[j];
                    }
    
                    WriteData(body);
                }
                else {
                    WriteData(buffer);
                }
    
                _dataAddedEvent.Set();
            }

    Friday, November 18, 2016 7:41 PM

Answers

  • User-1039489837 posted

    I figured it out.  I was overcomplicating the write process.  I rewrote it to be:

     //no boundary is inluded in buffer
                byte[] fileData = new byte[count];
                Buffer.BlockCopy(buffer, offset, fileData, 0, count);
    //write binary data to db
                WriteData(fileData);

    This combined with the API and Provider code from my initial question resolved the issue and everything works well now.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, November 23, 2016 1:42 PM

All replies

  • User-846834550 posted

    I suppose the issue is related to

    if (trimmed)
    {
         byte[] body = new byte[formData.Length];
    
         buffer = Encoding.UTF8.GetBytes(formData);
    
         for (int j = 0, k = 0; j < formData.Length; j++, k++)
         {
             body[k] = buffer[j];
         }
    
         WriteData(body);
    }

    error may occur in for loop.

    Wednesday, November 23, 2016 1:50 AM
  • User-1039489837 posted

    That's what I was thinking.  That there must be something wrong with my write method.  I've re-written it a few times and have gotten closer to getting what I need.  I thought it was an encoding issue, so I made sure to use the same encoding (UTF8) the whole way through the process (specified on the form tag submitting the file as well as throughout the writing of data).  I also defined the beginning and end boundary in byte[] to make sure that I was looking for the right thing.  I also used the Buffer.BlockCopy method instead of the loops to make sure the bytes are copied correctly.

    With my current iteration of the write, things seem to be getting a lot closer.  The entirety of the file is a perfect match EXCEPT for a few extra characters written to the end of the file.  So there is something wrong about my calculation of how many characters to copy from the buffer to the db in the portion of code below where contentStartIndex == -1 and endBoundaryIndex is > -1.   Any thoughts on what I could be missing here? 

     // The first boundary
                byte[] startBoundaryBinary = Encoding.UTF8.GetBytes("\r\n--" + this.Boundary + "\r\n");
                // The last boundary
                byte[] endBoundaryBinary = Encoding.UTF8.GetBytes("\r\n--" + this.Boundary + "--\r\n");
               
                var contentTypeBinary = Encoding.UTF8.GetBytes("Content-Type: " + this.ContentType + "\r\n\r\n");
    
                var boundaryBinary = Encoding.UTF8.GetBytes(this.Boundary);
    
                var contentStartIndex = CustomSqlStream.SearchBytes(buffer, contentTypeBinary);
                if (contentStartIndex != -1)
                {
                    contentStartIndex += contentTypeBinary.Length;
                    byte[] fileData = new byte[buffer.Length - contentStartIndex];
                    Buffer.BlockCopy(buffer, contentStartIndex, fileData, 0, buffer.Length - contentStartIndex);
    
                    //check to see if endboundary is also in this byte stream
                    var endBoundaryIndex = CustomSqlStream.SearchBytes(fileData, endBoundaryBinary);
                    if (endBoundaryIndex > -1)
                    {
                        //end boundary detected in same buffer, so capture content up until the boundary
                        byte[] totalFileData = new byte[endBoundaryIndex];
                        Buffer.BlockCopy(fileData, 0, totalFileData, 0, endBoundaryIndex);
                        WriteData(totalFileData);
                    }
                    else
                    {
                        WriteData(fileData);
                    }
                }
                else
                {
                    var endBoundaryIndex = CustomSqlStream.SearchBytes(buffer, endBoundaryBinary);
    
                    if (endBoundaryIndex > -1)
                    {
                        //boundary exists and it's the end of stream boundary
                        byte[] fileData = new byte[count - endBoundaryBinary.Length];
                        Buffer.BlockCopy(buffer, offset, fileData, 0, count- endBoundaryBinary.Length);
                        WriteData(fileData);
                    }
                    else
                    {
                        //no boundary is inluded in buffer
                        byte[] fileData = new byte[count];
                        Buffer.BlockCopy(buffer, offset, fileData, 0, count);
                        WriteData(fileData);
                    }
                }      
    
                _dataAddedEvent.Set();

    Wednesday, November 23, 2016 12:39 PM
  • User-1039489837 posted

    I figured it out.  I was overcomplicating the write process.  I rewrote it to be:

     //no boundary is inluded in buffer
                byte[] fileData = new byte[count];
                Buffer.BlockCopy(buffer, offset, fileData, 0, count);
    //write binary data to db
                WriteData(fileData);

    This combined with the API and Provider code from my initial question resolved the issue and everything works well now.

    • Marked as answer by Anonymous Thursday, October 7, 2021 12:00 AM
    Wednesday, November 23, 2016 1:42 PM