none
OutOfMemory Errors caused by com.microsoft.peregrine RRS feed

  • General discussion

  • Hello,

    We restarted our HDInsight cluster yesterday and since then have had several of our jobs failing due to OutOfMemory errors caused
    by the PlanLogListener.logPlans of the com.microsoft.peregrine plugin. (Full stacktrace at the end of this message)

    By chance, we finally found a way to disable this plugin.
    (If anyone runs into the same issue, you just need to add these two options to your spark submit)
    --conf spark.sql.queryExecutionListeners="" 
    --conf spark.sql.extensions=""

    But I just wanted to let you know that enabling by default an in-house Spark plugin without any HDInsight version change (we restarted our cluster without changing the version), without any announcement, or any kind of documentation aside for one single wiki page on the personal github of one of your employees (thanks to virtuabhi by the way, that wiki saved our day) is really not cool. Especially when this plugin is not sufficiently tested against potential performances issues.


    For the record, here is the full stacktrace:
    An error occurred while calling o3980.parquet.
    : java.lang.OutOfMemoryError: Java heap space
    	at java.util.Arrays.copyOf(Arrays.java:3332)
    	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    	at java.lang.StringBuffer.append(StringBuffer.java:270)
    	at com.microsoft.peregrine.spark.listeners.PlanLogListener.logPlans(PlanLogListener.java:76)
    	at com.microsoft.peregrine.spark.listeners.PlanLogListener.onSuccess(PlanLogListener.java:57)
    	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1$$anonfun$apply$mcV$sp$1.apply(QueryExecutionListener.scala:124)
    	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1$$anonfun$apply$mcV$sp$1.apply(QueryExecutionListener.scala:123)
    	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling$1.apply(QueryExecutionListener.scala:145)
    	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling$1.apply(QueryExecutionListener.scala:143)
    	at scala.collection.immutable.List.foreach(List.scala:381)
    	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    	at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
    	at org.apache.spark.sql.util.ExecutionListenerManager.org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling(QueryExecutionListener.scala:143)
    	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply$mcV$sp(QueryExecutionListener.scala:123)
    	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply(QueryExecutionListener.scala:123)
    	at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply(QueryExecutionListener.scala:123)
    	at org.apache.spark.sql.util.ExecutionListenerManager.readLock(QueryExecutionListener.scala:156)
    	at org.apache.spark.sql.util.ExecutionListenerManager.onSuccess(QueryExecutionListener.scala:122)
    	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:658)
    	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
    	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
    	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
    	at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:549)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    	at py4j.Gateway.invoke(Gateway.java:282)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    Tuesday, January 21, 2020 9:31 AM

All replies

  • Hi Mr. Pin,

    Thanks much for sharing this info. It will be beneficial for other community members who has similar issue.


    Thank you

    If a post helps to resolve your issue, please click the "Mark as Answer" of that post and/or click Answered "Vote as helpful" button of that post. By marking a post as Answered and/or Helpful, you help others find the answer faster.

    Tuesday, January 21, 2020 11:09 PM