none
Copy data from on-premise SQL Server to Azure Blog storage as parquet file

    Question

  • Hi Folks,

    I got a problem when copying data from my on-premise SQL server table into a parquet file on Azure blob storage as follow:

    { "errorCode": "2200", "message": "Failure happened on 'Sink' side. ErrorCode=UserErrorJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.UnsatisfiedLinkError:no snappyjava in java.library.path\ntotal entry:18\r\njava.lang.ClassLoader.loadLibrary(Unknown Source)\r\njava.lang.Runtime.loadLibrary0(Unknown Source)\r\njava.lang.System.loadLibrary(Unknown Source)\r\norg.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:170)\r\norg.xerial.snappy.SnappyLoader.load(SnappyLoader.java:145)\r\norg.xerial.snappy.Snappy.<clinit>(Snappy.java:47)\r\norg.apache.parquet.hadoop.codec.SnappyCompressor.compress(SnappyCompressor.java:67)\r\norg.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)\r\norg.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:92)\r\norg.apache.parquet.hadoop.CodecFactory$BytesCompressor.compress(CodecFactory.java:112)\r\norg.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:89)\r\norg.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:152)\r\norg.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:240)\r\norg.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:126)\r\norg.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:164)\r\norg.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)\r\norg.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:297)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBridge.close(ParquetWriterBridge.java:29)\r\n,Source=Microsoft.DataTransfer.Common,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'", "failureType": "UserError", "target": "Copy Data1" }

    My ADF is in Southeast Asia region and so does the Azure Blob storage. I am not sure if anyone have encountered such similar issue. Really appreciate it if you could share your workaround or solution to fix it.

    Regards,

    Di Truong

    Tuesday, October 16, 2018 8:33 AM

All replies

  • What is your selfhosted IR version? Are use using SQL auth or Windows auth for your sql server linked service?

    Maybe you could try to upgrade your IR or try to use SQL auth. 

    Saturday, October 20, 2018 8:52 AM
  • Hey Di Truong,

    I am experiencing the same issue. I am just starting to troubleshoot it. Please let me know if you found a solution. I have made sure that our IR host has the latest x64 JRE as noted in the ADF documentation. IR is up-to-date. Thanks!

    Monday, November 19, 2018 9:53 PM
  • Facing the same problem. Please update the post if find any solution.
    Tuesday, December 4, 2018 5:35 AM
  • Hi all,

    Just finished working with Azure support on this. A workable solution for us was to alleviate the necessity of the snappy libraries. We did this by setting the codec configuration on the Parquet file format spec to Gzip. Our Parquet dataset spec looks like this,

    {
        "name": "ParquetGeneric",
        "properties": {
            "linkedServiceName": {
                "referenceName": "ADLS",
                "type": "LinkedServiceReference"
            },
            "parameters": {
                "TableName": {
                    "type": "String"
                }
            },
            "type": "AzureDataLakeStoreFile",
            "typeProperties": {
                "format": {
                    "type": "ParquetFormat",
                    "codec": "Gzip"
                },
                "fileName": {
                    "value": "@concat(replace(dataset().TableName, '.', '_'), '.parquet')",
                    "type": "Expression"
                },
                "folderPath": "stage"
            }
        },
        "type": "Microsoft.DataFactory/factories/datasets"
    }

    Notice the codec in typeProperties format.

    We are using Databricks downstream, so Gzip is perfectly fine for us.

    Hopefully this is a resolution for everyone else as well.


    • Proposed as answer by Mahlon Barrault Tuesday, December 4, 2018 9:01 PM
    • Edited by Mahlon Barrault Tuesday, December 4, 2018 9:03 PM Readability, and clarity
    Tuesday, December 4, 2018 9:01 PM