How to set up research computing implement with C++ with Azure RRS feed

  • Question

  • My lab just got a sponsorship from Microsoft Azure and I'm exploring how to utilize it. I'm new to industrial level cloud service and pretty confused about tons of terminologies and concepts. In short, here is my scenario:

    1. I want to experiment the same algorithm with multiple datasets, aka data parallelism.
    2. The algorithm is implemented with C++ on Linux (ubuntu 16.04). I made my best to use static linking, but still depends on some dynamic libraries. However these dynamic libraries can be easily installed by apt.
    3. Each dataset is structured, means data (images, other files...) are organized with folders.

    The idea system configuration would be a bunch of identical VMs and a shared file system. Then I can submit my job with 'qsub' from a script or something. Is there a way to do this on Azure?

    I investigated the Batch Service, but having trouble installing dependencies after creating compute node. I also had trouble with storage. So far I only saw examples of using Batch with Blob storage, with is unstructured.

    So are there any other services in Azure can meet my requirement?


    Sunday, October 2, 2016 6:49 PM

All replies

  • Hi,

    From what you've described, Batch seems like it should be a decent fit.

    Azure Storage is unstructured but supports the notion of "virtual directories" which allows you to model structured data via blob storage. Tools such as blobxfer support mirroring a local directory up to storage and persisting the structure.

    Alternatively you could use Azure Files and mount the file share onto each compute node

    If you're interested in dockerizing your work you could look into azure batch shipyard which is a command line tool for submitting docker based workloads to azure batch. It supports GlusterFS for a shared file system.

    Note though, Azure Batch is primarily a work-scheduler (and resource manager). So submission wouldn't be via qsub (because Batch has its own APIs, you can submit a job and tasks to Batch using the Azure xplat cli -- you can see this blog post for some more details about Batch + Linux).

    If you're interested in a more configuration driven approach (like shipyard but not focused on Docker), we will be prototyping some configuration based workflows in a future release of the Azure xplat cli (probably about a month away from being available to the public though).

    Hope some of this helps,


    Monday, October 3, 2016 5:13 PM