Question about HDInsight IO cache RRS feed

  • Question

  • Hi HDInsight team!
    We here evaluating running Spark on HDInsight with data cache, and I went across this great article introducing HDInsight IO cache and I have a couple questions, hope this is the right place to ask.
    1. How is the Rubix cache integrated with HDInsight? Say if we want to run our own customized Spark binary, is it possible to leverage this as well?
    2. Following above question, is the IO cache available only for HDInsight? If I have a general purpose VM on Azure, is there a way to leverage the Rubix caching layer for general purpose blob store access? Or is there any other recommended caching tool available in this case?
    3. Has the team considered other caching layer other than Rubix? Just as mentioned in the article, there are Ignite and Alluxio. The article talked about SSD optimization. But it would be great if more detail on the comparison can be shared.
    Thursday, August 15, 2019 10:26 PM

All replies

  • Hi Chen,

    I’m working with the product team and get back to you when I have more information.

    Monday, August 19, 2019 7:20 AM
  • Hello Chen,

    Please see response to your questions

    1. We have only tested Rubix cache with the version of Spark we ship out of the box so I am not sure about the custom binaries 

    2. IO Cache is only available in HDInsight. However, at this is Apache project you can possibly make it work in your VMs 

    3. Yes, we did consider other caching options. However, Rubix worked well with our infrastructure.

    Monday, August 19, 2019 6:15 PM
  • Hi Chen,

    Just checking in to see if the above answer helped. If this answers your query, do click “Mark as Answer” and Up-Vote for the same. And, if you have any further query do let us know.

    Tuesday, August 27, 2019 4:41 AM
  • Hi Chen,

    Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

    Wednesday, August 28, 2019 10:18 AM