none
Parquet file issue while loading through polybase RRS feed

  • Question

  • Hi,

    We are facing issue while loading data from a parquet file that is generated with ADF, we are copying Oracle data (which has nls_characterset in UTF-8) into parquet file on azure blob and when we create the external table to it and try to access with select statement we got the error below

    "HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: parquet.io.api.Binary$ByteArraySliceBackedBinary cannot be cast to java.base/java.lang.Long"


    then as per a  suggestion on internet when i tried to change the same parquet file encoding to UTF-8 through powershell command then i got the error below

    EXTERNAL TABLE access failed due to internal error: 'File /oracle/sod_utf: HdfsBridge::CreateRecordReader - Unexpected error encountered creating the record reader: RuntimeException: wasbs:/xxx.blob.core.windows.net/oracle/sod_utf is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [82, 49, 13, 10]'

    I used below powershell command to convert the encoding

    Get-Content sod | Set-Content -Encoding utf8 sod_utf

    When i changed it with C# code then i got a different error

    Msg 110802, Level 16, State 1, Line 6
    110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsAccessException, Message: Error occurred while accessing HDFS external file[/oracle/sod_utf8][0]: Java exception raised on call to HdfsBridge_CreateRecordReader_V2. Java exception message:
    HdfsBridge::CreateRecordReader - Unexpected error encountered creating the record reader: TProtocolException: Required field 'version' was not found in serialized data! Struct: FileMetaData(version:0, schema:null, num_rows:0, row_groups:null)

    Looking forward for your help


    Cheers,


    • Edited by Amit-Tomar Friday, February 1, 2019 6:52 AM
    Friday, February 1, 2019 6:27 AM

Answers

  • Hi,

    I created the external table with nvarchar as the datatype for all the columns and it solved my problem.


    Cheers,

    • Marked as answer by Amit-Tomar Thursday, August 22, 2019 7:04 AM
    Thursday, August 22, 2019 2:11 AM

All replies

  • Hi Amit,

    Can you detail the T-SQL you used to create the external data source? Please follow the guidance in this document: CREATE EXTERNAL FILE FORMAT (Transact-SQL)

    An example:

    CREATE EXTERNAL FILE FORMAT parquetfile1  
    WITH (  
        FORMAT_TYPE = PARQUET,  
        DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'  
    );  

    If you are using a GzipCodec: 'org.apache.hadoop.io.compress.GzipCodec'

    Creating an external file format is a prerequisite for creating an External Table. By creating an External File Format, you specify the actual layout of the data referenced by an external table.

    Thanks,

    Mike


    Friday, February 1, 2019 10:55 PM
    Moderator
  • Hi,

      I have a similar issue. Did you found the solution for this..?

    Tuesday, February 19, 2019 4:44 PM
  • Can you make sure your columns/schema match between the source file and the destination table. A likely scenario is that the T-SQL can look correct (HADOOP for external data source TYPE and PARQUET for external file format FORMAT_TYPE) but the column definitions did not match that of the external table definition and the Parquet file. The process will start but immediately fail. If you look at the columns defined in the CREAT EXTERNAL TABLE statement and make sure they match that of the source file being imported there is a chance the import process is failing because of a mismatch.

    CREATE EXTERNAL TABLE (Transact-SQL)

    Friday, March 1, 2019 9:29 PM
    Moderator
  • Hi,

    I created the external table with nvarchar as the datatype for all the columns and it solved my problem.


    Cheers,

    • Marked as answer by Amit-Tomar Thursday, August 22, 2019 7:04 AM
    Thursday, August 22, 2019 2:11 AM