locked
Azure append blob storage does not support spark textFile API RRS feed

  • Question

  • Hello,

    When I run sc.textFile('/path to an append blob'), I got the following error. 

    Caused by: com.microsoft.azure.storage.StorageException: Incorrect Blob type, please use the correct Blob type to access a blob on the server. Expected BLOCK_BLOB, actual UNSPECIFIED.
    at com.microsoft.azure.storage.blob.CloudBlob$8.preProcessResponse(CloudBlob.java:1306)
    at com.microsoft.azure.storage.blob.CloudBlob$8.preProcessResponse(CloudBlob.java:1272)
    at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:146)
    at com.microsoft.azure.storage.blob.CloudBlob.downloadAttributes(CloudBlob.java:1265)
    at com.microsoft.azure.storage.blob.BlobInputStream.<init>(BlobInputStream.java:155)

    It seems that spark could only read data from block blob. I check the azure storage sdk in HDinsight, it is version 2.2.0. While the append blob just added in version azure-storage 3.0.0. My question is that does azure-storage 3.0.0 support spark textFile API? If yes, how could I update azure-storage to the latest version in HDInsight?

    Thanks

    Jun


    • Moved by Shreya Hajela Tuesday, September 15, 2015 10:32 AM related to HD insight.
    Monday, September 14, 2015 10:11 PM

All replies

  • Hi,

     

    Thank you for reaching out to us.I am currently researching to gather more information with regards to your request. I shall revert back to you with an update at the earliest. Sincerely appreciate your patience.

     

    Regards,

    Tuesday, September 15, 2015 10:36 AM
  • HDInsight currently only supports block blobs and I would think so does Spark as it runs on top of HDInsight. Does that answer your question?

    Regards.


    Debarchan Sarkar - MSFT ( This posting is provided AS IS with no warranties, and confers no rights.)

    Wednesday, September 16, 2015 6:40 AM
  • Hello Debarchan,

    Thanks for your reply. What I understand is the implement of append blob is similar to block blob. https://msdn.microsoft.com/en-us/library/azure/ee691964.aspx?f=255&MSPPError=-2147217396

    So HDInsight would read data from append blob if azure-storage in HDInsight update to 3.0.0. That is just my guess, I have no idea how to update the azure-storage jar file in the whole HDInsight cluster.

    Thanks

    Jun

    Wednesday, September 16, 2015 4:07 PM
  • Looks like there no support for Append Blobs currently in HDInsight.  :(

    Debarchan Sarkar - MSFT ( This posting is provided AS IS with no warranties, and confers no rights.)

    Monday, September 28, 2015 10:34 AM
  • So, is there any remedy?

    We started to use append blobs to collect logs, and the idea was to process them in HDInsight. Now, it appears completely impossible unless we download files and upload them again.

    Thanks,

       Boris

    Wednesday, June 22, 2016 3:39 PM
  • @bkron - what did you end up doing? 
    Wednesday, September 26, 2018 9:04 PM
  • Did anyone manage to solve this problem?
    Is there any way to read text blobs using Spark?
    • Proposed as answer by akkidx Friday, June 28, 2019 2:38 AM
    • Unproposed as answer by akkidx Saturday, June 29, 2019 3:42 PM
    Wednesday, June 26, 2019 2:26 AM
  • Hi akkidx,

    You may refer SO thread which addressing similar issue.

    Hope this helps.

    • Proposed as answer by akkidx Friday, June 28, 2019 2:38 AM
    • Unproposed as answer by akkidx Friday, June 28, 2019 2:38 AM
    Wednesday, June 26, 2019 8:16 AM
  • Hi all,

    Thanks a lot for your reply, 投票!

    That SO thread had indeed solved a few problems of mine earlier but I am still facing problems. I have put up my issue as a new problem here: https://stackoverflow.com/q/56800137/3061686

    It would be a great help if somebody could help me with this in any ways.

    Friday, June 28, 2019 2:41 AM
  • Okay, I think that reading from APPEND_BLOB is still not supported through Hadoop APIs.

    Earlier (with azure-storage-2.2.0.jar) I was getting the error (same as the question):

    Caused by: com.microsoft.azure.storage.StorageException: Incorrect Blob type, please use the correct Blob type to access a blob on the server. Expected BLOCK_BLOB, actual UNSPECIFIED.

    I upgraded to azure-storage-4.0.0.jar and it changed the error from Expected BLOCK_BLOB, actual UNSPECIFIED.to Expected BLOCK_BLOB, actual APPEND_BLOB..

    I upgraded to azure-storage-8.3.0.jar, struggled with some jackson json thingy for some time, fixed it by making `jackson-core-2.5.2.jar` available but the error is still the same Expected BLOCK_BLOB, actual APPEND_BLOB..

    I got these above errors with `hadoop fs -cat wasb://<storage-name>@<account-name>.blob.core.windows.net/<blob-name>`.

    This makes me believe that there is no way to read from append blobs via the Hadoop ecosystem as of today (or it is really really hard to find).


    • Edited by akkidx Sunday, June 30, 2019 3:34 AM
    Sunday, June 30, 2019 3:33 AM