locked
Sequence Number vs. Offset RRS feed

  • Question

  • I haven't really found a very good explanation of the difference between a sequence number and offset. Are they guaranteed to be unique within a partition? Do the values ever get reused? When would I use one over the other?

    Randy Minder

    Friday, April 13, 2018 6:53 PM

All replies

  • Hello Randy,

    From EventData document:

    SequenceNumber is the logical sequence number of the event within the partition stream of the Event Hub.

    Offset is the offset of the data relative to the Event Hub partition stream. The offset is a marker or identifier for an event within the Event Hubs stream. The identifier is unique within a partition of the Event Hubs stream.

    When I receive the events I get the following result:

    You can see that the sequence number increased by 1(one message) while offset not. Because offset operates on stream not messages.

    We can also reference the definition from ServiceBus

    -The SequenceNumber value is a unique 64-bit integer assigned to a message as it is accepted and stored by the broker and functions as its internal identifier. For partitioned entities, the topmost 16 bits reflect the partition identifier. Sequence numbers roll over to zero when the 48/64 bit range is exhausted.
    The sequence number can be trusted as a unique identifier since it is assigned by a central and neutral authority and not by clients. It also represents the true order of arrival, and is more precise than a time stamp as an order criterion, because time stamps may not have a high enough resolution at extreme message rates and may be subject to (however minimal) clock skew in situations where the broker ownership transitions between nodes.

    Best regards,

    Rita


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


    Monday, April 16, 2018 7:15 AM
  • Note that this part of the documentation for SequenceNumber:

    the topmost 16 bits reflect the partition identifier

    is not true for Azure Event Hubs.

    With an Azure Event Hub, sequence numbers are unique within a particular partition, but not across partitions, because the partition number is not in the topmost 16 bits (or anywhere). For example, I've got a test console app fetching events from a couple of partitions in the same event hub. Here's some sample output:

    0 31/05/2018 14:47:11 ad576b05-e60c-45a7-8de8-3470022e217c, 176 640256

    ... 1 31/05/2018 14:46:56 d9104936-8964-44cd-9ae3-8a3adcae24b8, 176 638016

    The first column shows the partition ID, so you can see these come from different partitions. The final two columns show the sequence number and offset. As you can see, each has sequence number 176. (As you can see from the timestamp, the actual messages were sent at somewhat different times, but that's normal - you typically don't get exactly alternating messages across partitions.)

    This in turn means that with Azure Event Hub, the sequence number seems likely not to roll over. Even at 2 million events per second (which is the full speed normally quoted for an event hub), it would take almost 300,000 years to run out of sequence ids. Longer in fact, since you likely need more than one partition to achieve that kind of throughput.

    So in practice I'm still somewhat perplexed by the existence of both sequence numbers and offsets. They both basically do the same thing: they provide monotonically-increasing partition-scoped representations of where you are in the stream. They just seem to do it in slightly different ways, and it's not really clear which you should use. (E.g., when checkpointing, you can keep track of either the offset or the sequence number. Either will work, because the API lets you specify either sequence number or offset as your start position, but which is preferred? It's not at all clear from the docs.)

    Friday, June 1, 2018 1:41 PM