Batch Data by Partition ID RRS feed

  • Question

  • From: Katy Shimizu @ShimizuKaty via Twitter

    I'm working on a high-throughput data ingestion system that uses an Azure Function sandwiched between two EventHubs to perform reverse geocoding on data coming in from devices in the field, like so: The data is partitioned by device ID upstream of the function, and I'd like to preserve that partitioning as it passes through the function.

    Goes like this: Event Hub A -> Azure Function -> Event Hub B -> Stream Analytics Job -> Azure SQL DB

    Event Hub A outputs JSON arrays of data. The Function uses Event Hub A as a trigger, taking each array element and performing a transformation, then sending it to Event Hub B. Per the WebJobs/EventHub documentation I'm using IAsyncCollector<EventData> and AddAsync() to send data; however, since I'm dealing with batches of data that don't all have the same partition id, this is throwing an exception. I'd like to know how I could batch data by partition id in order for this to work, somehow.



    Wednesday, August 3, 2016 7:58 PM

All replies