none
AvroExtractor [IUnstructuredReader.BaseStream.Length] is not supported.

    Question

  • Using the AvroExtractor is now broken. Something changed.

    The following Extractor throws error:

            public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
            {
                if (input.Length == 0)
                {
                    yield break;
                }
    
                using (var genericReader = AvroContainer.CreateGenericReader(input.BaseStream))
                {
                    using (var reader = new SequentialReader<dynamic>(genericReader))
                    { 
                        foreach (var obj in reader.Objects)
                        {
                            foreach (var column in output.Schema)
                            {                            
                                output.Set(column.Name, obj[column.Name]);
                            }         
    
                            yield return output.AsReadOnly();
                        }
                    }
                }
            }

    The problem is in the iterator for reader.Objects needs the Length, but now it is not supported.

    Here is the entire Error:

    The [Length] property of [IUnstructuredReader.BaseStream] is intentionally not supported to avoid ambiguity between the physical length of [IUnstructuredReader.BaseStream] and the logical length of the split of the entire input stream assigned to a vertex.

    A simple solution will be to modify Microsoft.Hadoop.Avro.Container

    From: AvroContainer.CreateGenericReader(input.BaseStream)

    To: AvroContainer.CreateGenericReader(input.BaseStream, input.Length)


    • Edited by Uri Kluk Tuesday, January 24, 2017 2:55 PM
    Wednesday, January 18, 2017 11:18 PM

All replies

  • I have a workaround. not recommended for big files:

    Copy the IUnstructuredReader to a MemoryStream and Read it:

                //workaround pass a MemoryStream instead
                byte[] buffer = new byte[input.Length];
                input.BaseStream.Read(buffer, 0, (int)input.Length);
                var ms = new MemoryStream(buffer);
    
                using (var genericReader = AvroContainer.CreateGenericReader(ms))

    Please fix the input.BaseStream to support Length, or require a second parameter for CreateGenericReader as posted previously.

    Best regards,

    Uri

    Wednesday, January 18, 2017 11:39 PM