locked
Partition key not set in Event hub output RRS feed

  • Question

  • I have the following query in Stream analytics:

    select p2.gatewayid as gatewayid,p2.deviceid, 'PowerMinute' as property , p2.time, p2.value 
    into output1 from powerreading p2 timestamp by time ...

    "output1" is configured as an output to Event hub and I have set the "Partition key column" to be gatewayid. Yet, when I look at the messages in the event hub, the PartitionKey is null. The message in the event hub is JSON and it contains a gatewayid element as expected. Any ideas why the partition key is not set?

    And as a follow-up: The message to the event hub is an array, which normally only contains one item, but in certain cases, e.g. after stopping the job and restarting with the start time being "last stopped time", there will be more than one item in the array. In my test case the gatewayid is always the same, but if there were more than one gatewayid in the array, how would the partitionKey be selected? Is this the reason that it doesn't work and is there any way to fix it?

    Wednesday, February 24, 2016 7:07 PM

Answers

  • Do you mean the EventData.PartitionKey value in output event hub messages ? As long GatewayId values are being used to hash and distribute the events to different EH partitions, do you need PartitionKey set for some reason ? 

    This is by design the way ASA sends data to output EH . When you configure the ASA EH with a partitionkey - to get the best write throughput, ASA doesn't use EH client to do partitioning and spraying the events to different partitions (EH client needs the PartitionKey set to do this distribution) , but it internally hashes the partitionkey and distributes the events to parallel EH senders to write to each partition directly. 

    To avoid any ambiguous behavior in arrays, please try to explicitly specify the item you need to use as partitionkey ( see GetArrayElement). 

    cheers !


    [Disclaimer] This posting is provided "AS IS" with no warranties, and confers no rights. User assumes all.

    Thursday, February 25, 2016 8:33 PM

All replies

  • Do you mean the EventData.PartitionKey value in output event hub messages ? As long GatewayId values are being used to hash and distribute the events to different EH partitions, do you need PartitionKey set for some reason ? 

    This is by design the way ASA sends data to output EH . When you configure the ASA EH with a partitionkey - to get the best write throughput, ASA doesn't use EH client to do partitioning and spraying the events to different partitions (EH client needs the PartitionKey set to do this distribution) , but it internally hashes the partitionkey and distributes the events to parallel EH senders to write to each partition directly. 

    To avoid any ambiguous behavior in arrays, please try to explicitly specify the item you need to use as partitionkey ( see GetArrayElement). 

    cheers !


    [Disclaimer] This posting is provided "AS IS" with no warranties, and confers no rights. User assumes all.

    Thursday, February 25, 2016 8:33 PM
  • When you set the partition key for EH Output, ASA currently uses partitioned senders for EH for better performance. When using partitioned senders, you can't set the PartitionKey of the message. Let us know if you have any other questions.

    Thanks!
    Todd

    Thursday, February 25, 2016 9:25 PM
  • Yes, I need the EventData.PartitionKey because the service I'm using to process the event hub messages uses this information. I don't want to go in to all the details here, but due to the architecture of my code by using the PartitionKey I don't need to parse the payload message to get this information, which makes processing more efficient.

    I think you misunderstood my question regarding the array (or else I have misunderstood how Stream Analytics handle the output). When configuring the stream analytics output there is a Format selection which can be set to Array or Line separated. My ASA query does not output any arrays per se, but I see that in certain cases the output from ASA is an array of several records. This typically happens when I stop the job and restart it with the start time being "last stopped time". In this case the first output from ASA normally contains an array of several records that were accumulated during the down time. In my test data I currently use only one gateway id, so it's not an issue. If i were to use more than one gateway id, I assume there could be more than one gateway id in such an array, and my question would then be which gateway id is used for the partition key.

    Monday, February 29, 2016 1:21 PM