How can i achieve Embarrassingly parallel SQL database as output sink RRS feed

  • Question

  • Hello Everyone,

    I have a input stream partitioned by customer id. I want to achieve embarrassingly parallelism while outputting to different sql databases for each customer.

    One way to achieve this is to write query with multiple steps. Each step will check for the specific partition and put the outputs in respective sql database. But this approach will require queries to process data of all the partitions which may be inefficient and also consume lots of SUs.

    I want to achieve something as described in https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization#multi-step-query-with-a-grouping-key article where in i have different partitions for different customers and i do parallel processing for each customer in my query using PARTITION BY partitionId.



    • Edited by ashishgaude Wednesday, November 29, 2017 3:18 PM
    Wednesday, November 29, 2017 3:18 PM


  • I think you cannot achieve it directly through the stream analytics job output configuration, as support for parallelism in output to sql db is not available at this point.

    You will have to either

    Option 1: Output to an Event Hub again partitioned by customer, and then use Azure Functions to batch write to the sql database. (if you output / sec is high, this is your only option to avoid performance issues)

    Option 2: Directly output to an Azure function which in turn writes to sql db.

    • Proposed as answer by Alwyn Pereira Monday, December 4, 2017 2:43 PM
    • Marked as answer by ashishgaude Monday, December 4, 2017 3:02 PM
    Wednesday, November 29, 2017 3:37 PM