none
Are there any .NET limitations in U-SQL?

    Question

  • I wrote a custom CSV Extractor using CsvHelper.

    Now I am trying to write a Excel Extractor using EPPlus, but I'm getting an error.

    "An unhandled exception from user code has been reported when invoking the method 'Extract' on the user type 'ExcelControlBook.ExcelExtractor'","description":"Unhandled exception from user code: \"Specified method is not supported.\"\nThe details includes more information including any inner exceptions and the stack trace where the exception was raised.","resolution":"Make sure the bug in the user code is fixed."

    What "specified method is not supported"? Can I not run all .NET assemblies? CsvHelper and EPPlus look the same on the outside, so I'm guessing something internal to EPPlus is causing the issue.

    Thanks!



    Friday, July 15, 2016 1:28 AM

Answers

  • EPPlus was trying to access stream.Length, but input.BaseStream (ScopeCosmosInputStream) doesn’t support that. I copied the ScopeCosmosInputStream to a MemoryStream and then used EPPlus and all worked. Not ideal, but good enough for now.
    Tuesday, August 2, 2016 6:38 PM

All replies

  • Here's the code:

    using OfficeOpenXml;
    using Excel;
    using CsvHelper;
    using Microsoft.Analytics.Interfaces;
    using Microsoft.Analytics.Types.Sql;
    using System;
    using System.IO;
    using System.Threading.Tasks;
    using System.Collections.Generic;
    using System.Linq;
    using System.Data;
    using System.Text;
    
    namespace SomeNamespace
    {
        [SqlUserDefinedExtractor]
       public class ExcelExtractor : IExtractor
        {
            public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow outputrow)
            {
                using (var excel = new ExcelPackage(input.BaseStream))
                {
                    var worksheet = excel.Workbook.Worksheets[1];
                    var rowCount = worksheet.Dimension.Rows;
                    var columnCount = worksheet.Dimension.Columns;
    
                    for (var r = 2; r <= rowCount; r++)
                    {
                        var rowValues = new List<object>(columnCount);
                        for (var c = 1; c <= columnCount; c++)
                        {
                            var cell = worksheet.Cells[r, c].FirstOrDefault();
                            if (cell != null)
                            {
                                rowValues.Add(cell.Value);
                                var inputValue = cell.Value.ToString();
    
                                var outputColumn = outputrow.Schema[c];
                                SetColumnValue(outputrow, outputColumn, c, inputValue);
                            }
                            else
                            {
                                rowValues.Add(null);
                                string inputValue = null;
    
                                var outputColumn = outputrow.Schema[c];
    
                                SetColumnValue(outputrow, outputColumn, c, inputValue);
                            }
                        }
    
                        yield return outputrow.AsReadOnly();
                    }
                }
            }
    
            private void SetColumnValue(IUpdatableRow outputrow, IColumn outputColumn, int columnIndex, string inputValue)
            {
    
                var type = outputColumn.Type;
                if (type.IsGenericType && type.GetGenericTypeDefinition().Equals(typeof(Nullable<>)))
                {
                    if (string.IsNullOrEmpty(inputValue))
                    {
                        outputrow.Set(columnIndex, (object)null);
                        return;
                    }
    
                    type = Nullable.GetUnderlyingType(type);
                }
            }
        }
    }

    And the call looks like

    @TEMP =
        EXTRACT foo string,
                bar string
        FROM @in
        USING new SomeNamespace.ExcelExtractor();
    

    Friday, July 15, 2016 5:09 PM
  • We are taking this into email for now.

    Note that you will need to provide the stack trace so we see where it fails. Also, note that Excel files are not splittable, so please mark your extractor with the AtomicFileProcessing=true annotation.


    Michael Rys

    Friday, July 15, 2016 6:57 PM
    Moderator
  • EPPlus was trying to access stream.Length, but input.BaseStream (ScopeCosmosInputStream) doesn’t support that. I copied the ScopeCosmosInputStream to a MemoryStream and then used EPPlus and all worked. Not ideal, but good enough for now.
    Tuesday, August 2, 2016 6:38 PM