none
Copying a Large 16 GB Azure SQL Table to Azure Data Lake Gen2 in Parquet File Type. Get Errorcode of 2200. 'An error occurred when invoking java, message: java.lang.OutOfMemoryError:Java heap space' RRS feed

  • Question

  • Hello, I am copying a large 16 GB Azure SQL table to Azure Data Lake Gen2  in Parquet File Type using Azure Data Facotyr. I got the Errorcode of 2200. 'An error occurred when invoking java, message: java.lang.OutOfMemoryError:Java heap space'

    It is a large 16 GB table and I do need it in Parquet file type  so I can process it in Databricks. Are there any solutions? This is all in the Azure environment. Can I configure my Azure SQL server to allow more memory for Java?
    • Edited by bellowman Wednesday, July 24, 2019 4:09 PM
    Wednesday, July 24, 2019 4:06 PM

All replies

  • Hi Bellowman,

    You can directly read the table in databricks from Azure SQL Database table, You can do that using the following approach - https://docs.databricks.com/spark/latest/data-sources/sql-databases-azure.html#read-from-azure-sql-database-or-sql-server

    Regards,

    Srihad.

    Please Mark as answer if you find the solution helpful, it will benefit other forum members as well.

    Wednesday, July 24, 2019 9:33 PM
  • That is true, but if it wasn't for the purpose of DataBricks, I still want it to be in Parquet file. So this doesn't answer my Parquet question.

    Thank you though


    • Edited by bellowman Thursday, July 25, 2019 9:47 PM
    Thursday, July 25, 2019 2:16 PM
  • Hi bellowman,

    If you are using a Self Hosted Integration Runtime:

    Please follow below steps to increase the heap memory for self-hosted IR machine and do restart IR to apply the changes. 

    For selfhosted IR: 

    • First check the total RAM of the IR host machine. Please make sure total RAM is >= 8GB
    • Add the following environment variable in the machine that hosts the self hosted IR: 
      _JAVA_OPTIONS "-Xms256m -Xmx16g" (Note: This is only a sample value. You can determine the min/max heap size by yourself).

    • Then restart the IR. 


    Also please do not set too many nodes on this machine if the parquet/orc work is heavy. 
    You may change this value in ADF portal -> Connections -> Integration Runtimes -> Edit -> Nodes 

    Hope this helps.

    Please do let us know how this works.


    [If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster. ]

    Tuesday, July 30, 2019 5:13 AM
    Moderator
  • Hi bellowman,

    Just checking to see if the above suggestion was helpful to resolve your issue? If you have any further query, do let us know.

    Thank you.


    [If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster. ]

    Tuesday, July 30, 2019 10:14 PM
    Moderator
  • Hi bellowman,

    Just checking to see if you still need assistance on this issue?

    Please refer to below MSDN thread which is similar to your issue and the resolution for it.
    https://social.msdn.microsoft.com/Forums/en-US/0396a33b-1f36-4870-9462-30fef7988b10/outofmemoryerror-copying-from-onprem-sqlserver-to-sink-dataset-in-parquet?forum=AzureDataFactory 

    Hope this helps.


    [If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster. ]

    Wednesday, July 31, 2019 6:02 PM
    Moderator