locked
AdaptiveSamplingTelemetryProcessor not throttling telemetry items RRS feed

  • Question

  • I am looking to reduce the volume of data we're storing in our production environment.  I found the documentation on adaptive sampling was looking to enable adaptive sampling, but I can't seem to get it to work.  I've been testing on my local machine using the following TelemetryProcessors:
    <TelemetryProcessors>
        <Add Type="Microsoft.ApplicationInsights.Extensibility.PerfCounterCollector.QuickPulse.QuickPulseTelemetryProcessor, Microsoft.AI.PerfCounterCollector" />
        <Add Type="Microsoft.ApplicationInsights.WindowsServer.TelemetryChannel.AdaptiveSamplingTelemetryProcessor, Microsoft.AI.ServerTelemetryChannel">
            <MaxTelemetryItemsPerSecond>1</MaxTelemetryItemsPerSecond>
            <MaxSamplingPercentage>20</MaxSamplingPercentage>
        </Add>
      </TelemetryProcessors>

    My understanding of how this is supposed to work is that by default, it will record up to 1 Telemetry Item / second up till the point it hits 1 item/s.  After that, it will start sampling and log 20% of telemetry items to app insights.  Regardless, I would except to not see a huge volume of requests being sent to application insights.  After running some test loads via WebJobs, I ran the following query:
    dependencies
    | where target == 'some-dependency'
    | summarize count() by bin(timestamp, 5m) 
    | order by timestamp desc 
    

    And I routinely saw counts for the different bins to be in excess of 4,000.  I would expect them to be near a cap of 300 (1 item/s * 60 * 5).  I am the only hitting this test dependency, so the telemetry items have to be coming from my machine.  Is there something I'm missing that would cause it to behave like this?
    Friday, April 24, 2020 7:20 PM

All replies

  • You can use this query to see what percentage is being retained:

    union requests,dependencies,pageViews,browserTimings,exceptions,traces
    | where timestamp > ago(1d)
    | summarize RetainedPercentage = 100/avg(itemCount) by bin(timestamp, 1h), itemType

    There are a few things that could be going on before we dig any deeper:

    1. What is the traffic pattern like? App Insights won't instantly throttle the content stream- it uses a rolling average checked periodically. That means if you spike the traffic all at once, the sampling won't kick in until after quite a bit of traffic has come through.
    2. Are you using any custom telemetry processors that could be affecting how much data is sent?
    3. Are only dependencies showing more results than expected? If other types like requests are showing the correct sampling percentage, try setting <IncludedTypes>Dependency</IncludedTypes> to see if that resolves the issue.
    4. It's unlikely to have that large of an impact, but another thing that could affect the number of records sent is how the sampling percentage decides what to keep. It keeps or removes all items that are related to each other. The impact shouldn't be that drastic, but there may be a bit of variation based on keeping related items together.

    Thursday, April 30, 2020 3:58 PM