none
Error ocuuring during unzipping a file on Blob storage: BlockCountExceedsLimit. RRS feed

  • Question

  • Hi all,

    How can I unzip a big file in blob storage? I have HDInsight spark cluster with a Azure Storage on it. I need to unzip a file which is 1TB after unzipping using the following command in the head node terminal: 

    hdfs dfs -cat /user/data/latest-all.json.gz | gunzip -d | hdfs dfs -put - /user/data/wiki.json &

    But it gives me an error:

    ERROR azure.NativeAzureFileSystem: Encountered Storage Exception for write on Blob : user/myb/data/wiki.json._COPYING_ Exception details: The uncommitted block count cannot exceed the maximum limit of 100,000 blocks. Please see the cause for further information. Error Code : BlockCountExceedsLimit
    cat: Unable to write to output stream.
    put: The uncommitted block count cannot exceed the maximum limit of 100,000 blocks. Please see the cause for further information.

    What I have to do?

    Thanks

    Wednesday, November 27, 2019 12:10 PM

All replies

  • Hello Maryam_Lewen and thank you for your question.  Until I hear back from one of our storage specialists, I have a couple things you could try.

    Clear uncommitted blocks by creating an empty, dummy blob in the same container, and moving it to replace your existing file.  (this would help if there are leftover uncomitted blocks from previous operations)

    Enable the WASB driver's block blob compaction by

    <property>
      <name>fs.azure.block.blob.with.compaction.dir</name>
      <value>user/myb/data/wiki.json</value>
    </property>
    (This would help by combining small blocks into larger ones so you are less likely to hit the limitation before committing.)

    Thursday, November 28, 2019 2:04 AM
  • Thanks Martin. I do not have access to the container through the portal. Where can I make this blob? 

    Thanks

    Thursday, November 28, 2019 9:36 AM
  • Since you are already accessing the container through hdfs dfs, you could use the touchz command.

    This command creates a file of zero length.

    https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/FileSystemShell.html#touchz

    Monday, December 2, 2019 10:17 PM
  • Has this resolved the issue?
    Thursday, December 5, 2019 12:05 AM
  • We have not received a response from you.  Are you still facing the issue?  If you found a solution, would you please share it here with the community?  Otherwise, let us know and we will continue to engage with you on the issue.

    Thursday, December 5, 2019 1:51 AM
  • Since we have still not heard back from you, we will assume you found your own resolution.  If you found a solution, would you please share it here with the community?
    Thursday, December 5, 2019 7:30 PM